CN115460710A - Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning - Google Patents
Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115460710A CN115460710A CN202211048897.6A CN202211048897A CN115460710A CN 115460710 A CN115460710 A CN 115460710A CN 202211048897 A CN202211048897 A CN 202211048897A CN 115460710 A CN115460710 A CN 115460710A
- Authority
- CN
- China
- Prior art keywords
- decision
- vehicle
- unloading
- vehicle terminal
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 238000004364 calculation method Methods 0.000 title claims description 41
- 238000013468 resource allocation Methods 0.000 claims abstract description 26
- 230000005540 biological transmission Effects 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims description 22
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000013139 quantization Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 230000009977 dual effect Effects 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims description 3
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 claims description 2
- 238000003306 harvesting Methods 0.000 claims description 2
- 230000003595 spectral effect Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 20
- 230000006855 networking Effects 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000005265 energy consumption Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which adopts an online framework based on the deep reinforcement learning to jointly optimize an unloading decision of each vehicle terminal, a vehicle local computation capability, vehicle data transmission power, a channel time slot resource allocation decision and a computation resource allocation decision of an edge server according to the time-varying property of a wireless channel, minimizes the computation time delay of a system, and obtains an optimal unloading decision. Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment. The method can quickly converge to an optimal unloading strategy under a wireless channel time-varying environment; when the weight of each vehicle terminal changes, the unloading strategy can be automatically adjusted and quickly converged to a new optimal unloading strategy, and the robustness is high.
Description
Technical Field
The invention relates to the field of mobile edge calculation, in particular to an intelligent online unloading method in a vehicle edge calculation scene based on deep reinforcement learning.
Background
In recent years, with the continuous development of the car networking technology and the continuous increase of the car holding capacity, a large number of car applications and multimedia services appear in succession, the requirements on the aspects of service quality, user experience and system overhead are higher, the requirements on resources such as computing capacity and energy consumption are larger, the computing resources and energy storage contained in a car cannot be well qualified for the tasks such as computing intensive and delay sensitive which are widely existed in the current novel car applications, and in the aspect of solving the problem of insufficient computing resources, the computing is unloaded to become a hot research topic in the car networking.
By computing offloading technology, it is meant to transmit, i.e., "offload," computing tasks to a server with free resources for computing and to transmit back the results of the computing, thereby solving the problem of insufficient computing resources. At present, cloud computing is a relatively mature computing unloading method, computing tasks are unloaded to a cloud end for computing, but the cloud computing is not suitable for a vehicle scene because transmission time delay between the cloud end and a vehicle is too long.
The mobile Edge Computing is a technology based on this, and offloads a Computing task to an Edge server closer to the vehicle for Computing, and accordingly, in a vehicle scene, combining the mobile Edge Computing with a vehicle networking is Vehicle Edge Computing (VEC). Vehicle Edge Computing (VEC) is an effective method for improving vehicle application performance, which is currently expected, and by combining the internet of vehicles with Mobile Edge Computing (MEC), time delay and energy consumption generated by a vehicle terminal during a Computing task can be effectively reduced. Nevertheless, due to the limited battery life and capacity of vehicles, it is difficult to guarantee the performance of vehicle-mounted applications over long periods of time.
Wireless Power Transfer (WPT) refers to a process of transferring energy from an energy source to an electrical load, which is realized by Wireless transmission, rather than by conventional wired means. The energy source transfers energy to the wireless device in the air medium to ensure that the wireless device has sufficient energy to handle various tasks. Recent studies have demonstrated the feasibility of wireless energy transmission techniques.
Because vehicle battery life and energy are limited, the present invention contemplates the addition of wireless energy transfer technology to transfer energy to the vehicle via the mobile edge server, further reducing vehicle energy consumption. The wireless energy transmission technology is combined with the VEC network, and the edge server can supplement energy for the vehicle in a wireless transmission mode, so that the performance of vehicle-mounted application and the service experience of a user are guaranteed and improved, and the wireless power supply mobile edge computing technology is provided. Whereas in a wireless fading environment, in a multi-user scenario, one major challenge is to jointly optimize a single computation mode (offload or local computation) and radio resource allocation, due to the presence of binary offload variables, such problems are typically modeled as Mixed Integer Programming (MIP) problems. Aiming at the problem, the MIP problem is solved by using the traditional branch-and-bound algorithm and dynamic programming, the computation complexity is extremely high, and the MIP algorithm cannot be applied to the application environment which changes in real time; the heuristic local search method and the convex relaxation method can reduce the calculation complexity, but both require a large number of iterations to achieve satisfactory local optimization, and are not suitable for making real-time unloading decisions in a fast fading channel.
Disclosure of Invention
The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, aiming at a wireless power supply mobile edge computation network comprising an edge server and a plurality of vehicle terminals.
The technical scheme of the invention is as follows:
the intelligent computing unloading method in the vehicle edge computing scene based on the deep reinforcement learning comprises the following steps:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by means of a deep neural network t ;
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantizing to K binary offload decisions;
and step 3: each binary unloading decision x obtained by quantization k Substituting into the question P1:
s.t.C1:x i ∈{0,1}
C5:where N is the number of vehicle terminals, x i For the unloading action of vehicle terminal i (i =1, …, N), x i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of M Total computing resources owned by the edge server, f i The computing resources allocated to the vehicle terminal i for the edge server,calculating the total time delay, w, for the edge of the vehicle terminal i i Task calculation for vehicle terminal iA weighting factor for the time delay is determined,calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau i The channel duration of the vehicle terminal i is in proportion;
transforming the problem P1 into a resource allocation sub-problem P2:
s.t.C2、C3、C4、C5
where φ is the number of cycles required to process 1 bit of task data, D i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h i For the gain, k, of the radio channel in the i-th time period i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i i A transmission overhead coefficient for the uplink of the vehicle terminal i; x 0 And X 1 For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma 2 Is the gaussian noise spectral density of the wireless channel;
and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
s.t.C4、C5
s.t.C2、C3
solving the problem P3 to obtain time slot allocation { a, tau }; solving the problem P4 to obtain the calculation resource allocation f;
and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 back to the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;
step 6: the obtained optimal unloading decision and the wireless channel gain h i Storing the data as experience marking data in a memory;
and 7: and randomly selecting a data sample from the memory every delta time frames, training the deep neural network, updating the parameter theta in the deep neural network, and then returning to the step 1 until the method is finished.
Further, the edge of the vehicle terminal i calculates the total time delay asIs composed ofWhereinFor the uplink data transmission delay of the vehicle terminal i,and the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.
Further, in step 2, the relaxed unload decision x is taken t The process of quantifying the offload decisions for the K bins is:
for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation t Generating a set of K quantization offload decisions { x } k }(k=1,…,K);
When k takes 1, the first binary offload decision x is generated 1,i Comprises the following steps:
wherein x t,i Make a decision x for offloading t The ith element in (1);
x is to be t Element of (2), rootOrdered by their distance from 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1); based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
further, in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, τ }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.
Advantageous effects
Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment.
The invention can quickly converge to the optimal unloading strategy under the time-varying environment of the wireless channel; when the weight of each vehicle terminal changes, the method can automatically adjust the unloading strategy and quickly converge to a new optimal unloading strategy, and has stronger robustness.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1: a simplified model diagram of problem P1;
FIG. 2: a method frame diagram;
FIG. 3: a training loss variation curve of the neural network;
FIG. 4 is a schematic view of: a revenue ratio curve graph;
FIG. 5: the impact of learning rate on profitability;
FIG. 6: impact of batch size on profitability;
FIG. 7 is a schematic view of: the impact of memory size on the revenue ratio;
FIG. 8: the impact of the training interval on the profitability;
FIG. 9: modifying weight training loss variation analysis;
FIG. 10: the weight to benefit ratio change analysis is modified.
Detailed Description
The invention provides an intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning, wherein an edge server can broadcast energy to each vehicle terminal, and the vehicle terminals can receive the energy and are used for executing tasks. The invention adopts a binary unloading decision, namely, the task is calculated either locally at the vehicle terminal or unloaded to the edge server for calculation. According to the time-varying conditions of the wireless channel, the method and the system minimize the calculation delay of the system by jointly optimizing the unloading decision of the vehicle terminal, the local calculation capacity and the data transmission power, the channel time slot allocation decision and the calculation resource allocation decision on the edge server. Wherein vehicle local computing capability refers to the ability of the vehicle to process computing tasks locally; the vehicle data transmission power refers to the power of the vehicle when transmitting the task data to the edge server; the channel time slot resource allocation decision means that the edge server needs to occupy channel resources when transmitting energy to each vehicle, and the edge server needs to occupy channel resources when transmitting task data to the edge server by adopting edge calculation, but the channel resources are limited and need to reasonably allocate the channel resources to ensure the minimum time delay; the edge server computing resource allocation decision means that for the vehicles adopting edge computing, the edge server needs computing capacity allocated for each vehicle, but the total computing capacity of the edge server is limited, so that the computing capacity needs to be reasonably allocated to ensure that the time delay is minimum.
The problem can be modeled as a mixed integer programming problem, but due to the combination property of multi-user computing mode selection and strong coupling of the multi-user computing mode selection and resource allocation, the traditional numerical optimization method is difficult to rapidly solve in channel coherence time, and the online unloading algorithm based on deep reinforcement learning provided by the invention jointly optimizes 5 variables of an unloading decision, vehicle local computing capacity, vehicle data transmission power, a channel time slot resource allocation decision and an edge server computing resource allocation decision to obtain an optimal unloading decision, does not need to solve combination optimization, greatly reduces the computing complexity, can obtain an optimal unloading strategy in a short time, and enables the total time delay of the system to be minimum.
Parameter table
The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which is realized by adopting the following steps in each time frame:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by a deep neural network t ;
The deep neural network can generate relaxed unloading decisions and quantitatively calculate to obtain the optimal unloading decisions even if the deep neural network is not completely trained. Combining the optimal unloading decision obtained each time and the wireless channel gain in the current time period into a data sample for storage; one data sample is selected from the data samples each time to train the neural network, so that a better unloading decision can be obtained next time.
The unloading decision is a parameter for representing the unloading action of the vehicle terminal, and values of 0 and 1 represent local calculation and unloading calculation respectively. Parameter x i To representAnd (4) unloading decision of the ith vehicle terminal in the time period. And parameter x t Refers to the relaxed offload decision set generated by the neural network in each time period, and unlike the previous parameter which only takes the value of 0 or 1, this parameter is a set containing N elements, each element having a value between 0 and 1.
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantized to K binary offload decisions:
specifically, for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation t Generating a set of K quantization offload decisions { x } k } (K =1, …, K), as shown below, where x k And similarly, the unloading decision is a set containing N elements, and each element takes the value of 0 or 1 and represents the unloading decision of the corresponding vehicle terminal.
When k takes 1, the first binary offload decision x is generated 1,i Comprises the following steps:
wherein x t,i Is a set x t The ith element in (1).
X is to be t According to their distance from 0.5 (according to x) t Is ordered by the relative distance between each entry of (a) and the value 0.5), is represented as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1). Based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
and step 3: each binary unloading decision x obtained by quantization k Substituting into the question P1:
s.t.C1:x i ∈{0,1}
at this time, the problem P1 is transformed into a resource allocation sub-problem P2:
s.t.C2、C3、C4、C5
x in the formula 0 And X 1 For representing sets of vehicles taking local and off-load calculations, respectively, i.e. X 0 Represents a set of vehicle terminal offload decisions that take local calculations with a value of 0 in the set of offload decisions, and X 1 And representing the collection of the unloading decisions of the vehicle terminal, which are calculated locally when the value of the unloading decisions in the group is 1.
And 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
s.t.C4、C5
s.t.C2、C3
solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.
And 5: for each binary unloading decision, the result obtained by solving the problems P3 and P4 is replaced back to the problem P1 and the system time delay obtained by solving the problem is solved, and for all the binary unloading decisions, the binary unloading decision with the minimum system time delay is selected as the optimal unloading decision x i 。
If the unloading decisions of each vehicle terminal are determined, the problem P1 can be converted into P2, and the conversion process is achieved by directly and respectively calculating the quantified K unloading decisions, but an optimal one is finally selected from the quantified K unloading decisions. The optimal unloading decision is obtained by decomposing P2 into P3 and P4, and then the optimal unloading decision is substituted into P1 to solve the optimal time delay.
All binary offload decisions here refer to K offload decisions that the neural network gets through the then current radio channel gain per time period. The parameter symbols in the above problem do not all address a certain vehicle terminal, but indicate the unloading operation of all vehicle terminals. Neither slot allocation nor computing resource allocation is directed to a certain vehicle terminal, but rather comprises a set of policies for allocating all vehicle terminals for which offload computations are undertaken.
Step 6: the obtained optimal unloading decision x i And a radio channel gain h i Stored in memory as empirical marker data.
And 7: and randomly selecting a data sample (empirical marking data stored in a previous memory) from the memory every delta time frames to train the deep neural network. And updates the DNN parameter θ using Adam algorithm.
Algorithm pseudo code:
the following description is presented in conjunction with specific embodiments which are meant to be illustrative, but not limiting, of the invention.
The calculation model of the embodiment is composed of two parts, namely local calculation and edge calculation. In a local calculation mode, calculation tasks are directly calculated on the vehicle terminal, and time delay is the local calculation time delay of the vehicle terminal iIn the edge calculation mode, a calculation task is unloaded to an edge server by a vehicle terminal for calculation, time delay is uplink data transmission time delay of the vehicle terminal i, and the edge server executes execution time delay required by the task transmitted by the vehicle terminal i and result data return time delay of the edge server. The edge service ignores the time delay generated in the result data return stage of the edge server, and the total time delay of the edge calculation of the vehicle terminal i isComprises the following steps:
the invention optimizes the unloading decision x of each intelligent vehicle terminal i in a combined manner i E.g., {0,1}, channel slot allocation decision { a, τ i Computation resource allocation decision f of edge server i Local computing power f of vehicle terminal i l And data transmission power P i To minimize the system utility function T (x, a, τ, f) l P), an optimization problem P0 of minimum delay is established, which is expressed as follows:
s.t.C1:x i ∈{0,1}
wherein: w is a i Calculating a weight factor, w, of the time delay for the i task of the vehicle terminal i The relative size of the vehicle unit (A) reflects the relative priority of different vehicles, namely reflects the sensitivity of calculation tasks of different vehicles to time delay, and the larger the weight factor is, the more sensitive the calculation tasks of the vehicle unit to the time delay is. f. of M Is the maximum computing power of the edge server.
For the problem P0, first the local computing power f of the vehicle terminal i is examined i l And data transmission power P i Impact on problem optimization:
the transmission delay generated when the energy received from the edge server is totally used for local calculationCan be minimized. At this time, the time delay is calculated locallyComprises the following steps:
when the energy received from the edge server is totally used for data transmission, the generated transmission delayCan be minimized. The minimum transmission delay is:
satisfy local computing power f of vehicle terminal i under optimal condition i l And data transmission power P i After determination, consider the utility function T (x, a, τ, f):
at this time, the problem P0 can be transformed into the problem P1:
s.t.C1:x i ∈{0,1}
the algorithm consists of three alternating phases: offloading decision generation, channel and computing resource allocation, offloading policy update. The algorithm framework is shown in fig. 1.
An unloading decision generation stage: by observing the gain h of the radio channel within the time frame t i As an input, DNN is used to establish an offload decision x with a slack t As a mapping between the outputs. Using ReLu function as the activation function of the hidden layer of the neural network and using Sigmoid function as the activation function of the output layer of the neural network to lead the output relaxation unloading decision x t Satisfy x i E (0,1). Decision x for relaxation offloading using an order-preserving quantization (order-preserving quantization) method t Quantization is performed, specifically, for a given 1 ≦ K ≦ N +1, the decision x is unloaded by the relaxation t Generating a set of ground K quantization offload decisions { x k As follows:
the first binary offload decision generated is:
x is to be t Are ordered so that their distances are each 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…|x t,(i) -0.5|…|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1). Based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
channel and computing resource allocation phase: the problem is decomposed into two sub-problems of offloading decision and resource allocation. As shown in fig. 1.
The resource allocation subproblem P2 has the expression:
s.t.C2、C3、C4、C5
the method is decomposed into two sub-problems of transmission delay solving and calculation delay solving, which are respectively expressed as:
s.t.C4、C5
s.t.C2、C3
for the transmission delay subproblem P3, solving the time slot allocation { a, tau } by using an interior point method; for the computation delay subproblem P4, the computation resource allocation f is solved using the lagrange dual method.
After solving the optimal channel time slot allocation decision and the edge server computing resource allocation decision, the channel time slot allocation decision { a, tau calculated by K unloading decisions is calculated i } and edge server computing resource allocation decision f i Substituting the problem P1 to calculate the total time delay, selecting the minimum total time delay as the optimal calculation time delay, and the corresponding unloading decision is the optimal unloading decision.
And (3) an unloading strategy updating stage: maintaining an initial empty memory with limited capacity, and making an optimal unloading decision x at the end of each time frame i Corresponding wireless channel gain h i The DNNs are trained using empirical label data stored in memory. When the memory space is full, the oldest data in the memory is replaced with the newest data to ensure that the data samples used for training are reliable. A set of data samples is randomly selected from memory for training using an empirical playback technique. Using AdaThe m-algorithm updates the DNN parameter θ by minimizing the cross-entropy loss. After enough empirical label data has been collected, the DNN is trained only once every δ time frames.
The present invention obtains simulation results by performing simulation with a sensor Flow 1.0 in Python, see fig. 3 to 8.
We used a Powercast TX91501-3W of P =3 watts as the energy transmitter of the edge server, with the P2110 harvester as the energy receiver of each vehicle terminal. Let us note the distance d from the ith vehicle terminal to the edge server i ,d i The values of the vehicle terminals are uniformly distributed among (2.5,5.2) meters, namely, the vehicle terminals are uniformly distributed in a circular ring with the center of the edge server and the values of the upper radius and the lower radius of (2.5,5.2) meters, so that the access points of the edge server can be ensured to cover all the vehicle terminals. Without loss of generality, we assume that the radio channel gain remains constant within one time frame and changes independently between different time frames. We give the value of the weighting factor of the vehicle terminal i by the following rule: if the number of the vehicle terminal i is odd, w i Set to 1, otherwise set to 1.5. Consider a neural network consisting of one input layer, two hidden layers, and one output layer, where the first and second layers of hidden neurons contain 120 and 80 hidden neurons, respectively. Setting the training interval δ =10, training batch size | T | =128, memory size 1024, learning rate of adam optimizer is set to 0.01. Other parameter settings are shown in the following table:
simulation parameters | Parameter setting |
Bandwidth W | 2MHZ |
Calculating task |
5×10 6 bytes |
Channel noise power σ 2 | 1.0×10 -8 |
Edge server computing power f M | 8×10 9 cycles/s |
Edge server energy transfer power P | 0.5W |
Number of cycles per |
100 |
Transmission overhead coefficient beta i | 4 |
All simulation experiments were run on a computer equipped with an Inter Pentium CPU G860.
Fig. 3 shows the convergence of the loss of neural network training. It can be seen from the figure that the training loss gradually decreases and gradually converges to a smaller value as the number of training times increases. Then, with the further improvement of the training times, the training loss is basically kept stable. The algorithm can be considered to be capable of rapidly converging under the vehicle-mounted environment with time-varying channel conditions, and the unloading decision can be rapidly made.
Definition ofEnumerating the optimal system delay, T, calculated for all binary offload decisions using a greedy algorithm * And (h, x) is the optimal time delay obtained by the algorithm. Defining a profitability ratioThe algorithm of the invention is shown to find the optimal time delay effectiveness. The profitability variation is shown in fig. 4. When training begins, the income ratio is about 0.9, and it is obvious that the most time delay obtained by the algorithm has a larger difference with the optimal time delay calculated by the greedy algorithm. With the increasing training times, the gain ratio can be seen to approach to 1, which means that the optimal delay difference obtained by the two algorithms is shrinking continuously. After the number of training passes 2100, we can see that the profit-to-profit ratio basically goes to 1 and no longer changes. At this time, the difference between the optimal solutions obtained by the two algorithms is already small, and we can think that the proposed algorithm gradually converges to the optimal solution.
FIGS. 5 to 8 are graphs showing the effects of learning rate, batch size, memory size, and neural network training interval in the Adam optimizer on the convergence and effectiveness of the present invention,
the robustness of the invention is analyzed in the scene of alternating weights. At the beginning of the simulation experiment, the weights of all vehicle units are determined according to the parity of the serial numbers, and for the vehicle terminals with odd serial numbers, the weights are set to be 1, and for the vehicle terminals with even serial numbers, the weights are set to be 1.5. When the training times reach 2500 times, the weights of the vehicle terminals with odd numbers and even numbers are exchanged, and finally when the training times reach 3500 times, the weights are restored to the original state. And (4) solving the optimal time delay under different weight distribution by a greedy algorithm to calculate the revenue ratio. The training loss and the profit-to-profit ratio change are shown in fig. 9 and 10. Despite the change in weights, training loss and profitability are not affected, and a solution that closely approximates the optimal solution can still be obtained. This illustrates that the present invention can quickly automatically adjust the offloading policy based on the changed weights and eventually converge to a new optimal offloading decision.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.
Claims (4)
1. An intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by a deep neural network t ;
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantizing to K binary offload decisions;
and step 3: each binary unloading decision x obtained by quantization k Substituting into problem P1:
s.t.C1:x i ∈{0,1}
where N is the number of vehicle terminals, x i For the unloading action of vehicle terminal i (i =1, …, N), x i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of M Total computing resources owned by the edge server, f i Allocating computing resources, T, of a vehicle terminal i to an edge server i c Calculating the total time delay, w, for the edge of the vehicle terminal i i Calculating a weight factor, T, of the time delay for the i task of the vehicle terminal i l Calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau i The channel duration of the vehicle terminal i is proportional to the channel duration;
translating the problem P1 into a resource allocation sub-problem P2:
s.t.C2、C3、C4、C5
where φ is the number of cycles required to process 1 bit of task data, D i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h i For the gain of the radio channel in the i-th time period, k i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i i A transmission overhead coefficient for the uplink of the vehicle terminal i; x 0 And X 1 For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma 2 Is the gaussian noise spectral density of the wireless channel;
and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
s.t.C4、C5
s.t.C2、C3
solving the problem P3 to obtain time slot allocation { a, tau }; solving the problem P4 to obtain the calculation resource allocation f;
and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 into the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;
step 6: the obtained optimal unloading decision and the wireless channel gain h i Storing the data as experience marking data in a memory;
and 7: and randomly selecting a data sample from the memory every delta time frames, training the deep neural network, updating the parameter theta in the deep neural network, and then returning to the step 1 until the method is finished.
2. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: the total time delay of the edge calculation of the vehicle terminal i is T i c Is T i c =T i r +T i es Wherein T is i r For the uplink data transmission delay, T, of the vehicle terminal i i es And the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.
3. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 2, the relaxed unload decision x is taken t The process of quantifying the offload decisions into K bins is:
for a given 1 ≦ K ≦ N +1, the decision x is taken by the relaxation offload t Generating a set of K quantization offload decisions { x } k }(k=1,…,K);
The first binary offload decision x generated when k takes 1 1,i Comprises the following steps:
wherein x t,i Make a decision x for offloading t The ith element in (1);
x is to be t Are ordered according to their distance from 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1); based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
4. the intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the problem P4 is solved using the lagrangian dual method to obtain the computational resource allocation f.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211048897.6A CN115460710B (en) | 2022-08-30 | 2022-08-30 | Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211048897.6A CN115460710B (en) | 2022-08-30 | 2022-08-30 | Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115460710A true CN115460710A (en) | 2022-12-09 |
CN115460710B CN115460710B (en) | 2024-08-23 |
Family
ID=84300120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211048897.6A Active CN115460710B (en) | 2022-08-30 | 2022-08-30 | Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115460710B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116166444A (en) * | 2023-04-26 | 2023-05-26 | 南京邮电大学 | Collaborative reasoning method oriented to deep learning hierarchical model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20210063990A (en) * | 2019-11-25 | 2021-06-02 | 경희대학교 산학협력단 | Method of machine learning based unmanned aerial vehicle mobile edge server collabrative task matching and offloading |
CN113296845A (en) * | 2021-06-03 | 2021-08-24 | 南京邮电大学 | Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment |
CN114025359A (en) * | 2021-11-01 | 2022-02-08 | 湖南大学 | Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning |
US20220210686A1 (en) * | 2020-07-15 | 2022-06-30 | Nantong University | Energy-efficient optimized computing offloading method for vehicular edge computing network and system thereof |
-
2022
- 2022-08-30 CN CN202211048897.6A patent/CN115460710B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20210063990A (en) * | 2019-11-25 | 2021-06-02 | 경희대학교 산학협력단 | Method of machine learning based unmanned aerial vehicle mobile edge server collabrative task matching and offloading |
US20220210686A1 (en) * | 2020-07-15 | 2022-06-30 | Nantong University | Energy-efficient optimized computing offloading method for vehicular edge computing network and system thereof |
CN113296845A (en) * | 2021-06-03 | 2021-08-24 | 南京邮电大学 | Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment |
CN114025359A (en) * | 2021-11-01 | 2022-02-08 | 湖南大学 | Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning |
Non-Patent Citations (4)
Title |
---|
MINGHUI MIN; LIANG XIAO; YE CHEN; PENG CHENG; DI WU; WEIHUA ZHUANG: "Learning-based computation offloading for IoT devices with energy harvesting", IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 1 January 2019 (2019-01-01) * |
MOLIN LI; TONG CHEN; JIAXIN ZENG; XIAOBO ZHOU; KEQIU LI; HENG QI: "D2D-Assisted Computation Offloading for Mobile Edge Computing Systems with Energy Harvesting", 2019 20TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 12 March 2020 (2020-03-12) * |
张泽维,李陶深,杨林峰: "多目标优化的SWIPT-MEC任务分级卸载架构与优化算法", 广西大学学报, 30 April 2022 (2022-04-30) * |
汪彦婷: "移动边缘计算中异质资源联合调度策略研究", 信息科技辑, 15 February 2020 (2020-02-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116166444A (en) * | 2023-04-26 | 2023-05-26 | 南京邮电大学 | Collaborative reasoning method oriented to deep learning hierarchical model |
Also Published As
Publication number | Publication date |
---|---|
CN115460710B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111245651B (en) | Task unloading method based on power control and resource allocation | |
Bi et al. | Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks | |
CN110928654B (en) | Distributed online task unloading scheduling method in edge computing system | |
CN111556461A (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
CN112105062B (en) | Mobile edge computing network energy consumption minimization strategy method under time-sensitive condition | |
Yao et al. | Caching in dynamic IoT networks by deep reinforcement learning | |
Xie et al. | Backscatter-assisted computation offloading for energy harvesting IoT devices via policy-based deep reinforcement learning | |
CN114650228B (en) | Federal learning scheduling method based on calculation unloading in heterogeneous network | |
CN110233755B (en) | Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things | |
CN113727362B (en) | Unloading strategy method of wireless power supply system based on deep reinforcement learning | |
CN114554495B (en) | Federal learning-oriented user scheduling and resource allocation method | |
Bi et al. | Stable online computation offloading via lyapunov-guided deep reinforcement learning | |
CN115460710B (en) | Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
He et al. | Computation offloading and resource allocation based on DT-MEC-assisted federated learning framework | |
Huang et al. | Performance optimization for energy-efficient industrial Internet of Things based on ambient backscatter communication: An A3C-FL approach | |
Han et al. | Multi-step reinforcement learning-based offloading for vehicle edge computing | |
Jiao et al. | Deep reinforcement learning for time-energy tradeoff online offloading in MEC-enabled industrial internet of things | |
CN114521023A (en) | SWIPT-assisted NOMA-MEC system resource allocation modeling method | |
CN116341679A (en) | Design method of federal edge learning scheduling strategy with high aging | |
CN115914230A (en) | Adaptive mobile edge computing unloading and resource allocation method | |
CN113157344B (en) | DRL-based energy consumption perception task unloading method in mobile edge computing environment | |
Gao et al. | Deep reinforcement learning-based computation offloading and optimal resource allocation in industrial Internet of Things with NOMA | |
Wang et al. | Adaptive compute offloading algorithm for metasystem based on deep reinforcement learning | |
CN114281527A (en) | Low-complexity mobile edge computing resource allocation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |