CN116782296A

CN116782296A - Digital twinning-based internet-of-vehicles edge computing and unloading multi-objective decision method

Info

Publication number: CN116782296A
Application number: CN202310611900.9A
Authority: CN
Inventors: 焦文静; 林艳; 张一晋; 张伟斌; 李骏
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-09-19

Abstract

The invention discloses a digital twin-based internet-of-vehicles edge computing unloading multi-objective decision method, which specifically comprises the following steps: inputting an edge car networking environment based on digital twinning, initializing an executor-evaluator network parameter and a multi-objective evolutionary algorithm parameter based on decomposition; performing unloading action to obtain a reward vector and a completion mark, and storing the reward vector and the completion mark in a cache area; calculating an advantage estimation based on the current cost function and the discount return according to the buffer data fitting value function; updating a solution set and fitness values by using a multi-objective evolutionary algorithm based on decomposition and a reward vector, and returning the pareto optimal solution as an executor network learning parameter; constraint strategy updating by a truncation method, calculating a loss function and updating the strategy; and when the completion flag is true, resetting the edge Internet of vehicles environment to start the next round. The method is suitable for realizing the long-term trade-off between the minimum time delay, the energy consumption and the cloud computing cost through the intelligent edge unloading decision assisted by the digital twin under the unknown dynamic edge car networking environment.

Description

Digital twinning-based internet-of-vehicles edge computing and unloading multi-objective decision method

Technical Field

The invention belongs to the technical field of wireless communication, and relates to a digital twin-based internet-of-vehicles edge computing and unloading multi-objective decision method.

Background

The intelligent degree of automobiles is continuously improved along with the popularization of vehicle-mounted electronic equipment, and the application scene of the Internet of vehicles is expanded from the traditional mobile Internet to the intelligent transportation field (Zhang J, letaief K B.Mobile Edge Intelligence and Computing for the Internet of Vehicles [ J ] Proceedings of the IEEE,2019,108 (2): 246-261.). The mobile edge calculation (Mobile Edge Computing, MEC) is applied to the Internet of vehicles, so that the demands of real-time, safety, intelligence and the like can be met, time delay and energy consumption caused by remote transmission of a large amount of data are saved, the task calculation pressure of a single intelligent driving vehicle is relieved, and the data calculation efficiency (Lin Y, zhang Y, li J, et al, polarity-Aware Online Task Offloading for Heterogeneous Vehicular Edge Computing Using Contextual Clustering of Bandits [ J ] IEEE Internet of Things Journal,2022,9 (7): 5422-5433.) of the whole Internet of vehicles system is improved.

Digital twinning is an integrated multi-physical, multi-scale, multi-disciplinary attribute, has real-time synchronization, faithful mapping, high fidelity characteristics, and can realize the interaction and fusion of the physical world and the information world (Tao Fei, liu Weiran, liu Jianhua. Digital twinning and application exploration [ J ]. Computer integrated manufacturing system, 2018,24 (01): 1-18.). When the digital twin technology is used for assisting in edge computing and unloading, the application of the Internet of vehicles only needs to upload data to the digital twin network on the edge equipment, so that delay among vehicles and between the vehicles and the edge server is effectively reduced, information transmission is more timely and accurate, and the performance and usability of the system are improved.

In view of the advantages of real-time performance, aggregation and the like of the digital twin data, researchers consider utilizing the data in the digital twin network, and fully combine with reinforcement learning theory to find an optimal edge computing task offloading strategy so as to improve strategy learning efficiency and accuracy (Lu Y, maharjan S, zhang Y.adaptive Edge Association for Wireless Digital Twin Networks in G [ J ]. IEEE Internet of Things Journal,2021,8 (22): 16219-16230). The Internet of vehicles can not only provide data of a virtual space by utilizing a digital twin technology, but also infer information of auxiliary task offloading decision from data generated by a digital twin body and an environment, so that the burden of subsequent task offloading decision work is greatly reduced, the data volume required by training a reinforcement learning algorithm is reduced, and the learning efficiency of resource management is improved (Zhang K, cao J, zhang Y.adaptive Digital Twin and Multiagent Deep Reinforcement Learning for Vehicular Edge Computing and Networks [ J ]. IEEE Transactions on Industrial Informatics,2021,18 (2): 1405-1413.). Although the data analysis is efficient and convenient by using digital twinning, the estimated data of the digital twinning network in the edge server has a non-negligible error with the real data of the Internet of vehicles in the physical world. In order to reduce the error between the digital twins and the physical entities, many scholars have conducted further research based on this in recent years (Yuan X, chen J, zhang N, et al digital Twin-Driven Vehicular Task Offloading and IRS Configuration in the Internet of Vehicles [ J ]. IEEE Transactions on Intelligent Transportation Systems,2022,23 (12): 24290-24304.). However, these researches mainly use relatively traditional reinforcement learning algorithms, such as DDQN (Double Deep Q-network) algorithms, which have low data efficiency and general stability, and cannot fully exert the advantage of good real-time performance of digital twinning. And the time delay and the energy consumption are taken as two contradictory optimization targets, and some students use a weighted sum method to take the total rewards as the optimization targets, so that the multi-target optimization problem cannot be fully considered, and the unloading decision of the edge calculation task of the internet of vehicles cannot be optimized to the maximum extent.

Disclosure of Invention

The invention aims to provide a digital twin-based multi-objective decision method for computing and unloading the edges of the Internet of vehicles, which is used for minimizing the long-term task unloading cost in the balance among multiple objectives such as time delay, energy consumption, cloud computing cost and the like and realizing low-cost and high-efficiency intelligent task unloading.

The technical solution for realizing the purpose of the invention is as follows: a digital twinning-based internet-of-vehicles edge computing unloading multi-objective decision method comprises the following steps:

step 1, inputting an edge car networking environment based on digital twinning, initializing an executor-evaluator network parameter, and initializing a multi-objective evolutionary algorithm parameter based on decomposition;

step 2, in the current time slot, each vehicle user selects an unloading scheme according to the action generated in the previous time slot, obtains the rewarding vector and the completion mark of environmental feedback and stores the rewarding vector and the completion mark in a buffer area;

step 3, calculating an advantage estimation based on the current cost function and discount return by using a mean square error regression fit value function according to the data of the buffer area;

step 4, updating a solution set and an fitness value by using a multi-objective evolutionary algorithm based on decomposition and a reward vector, and returning a pareto optimal solution set which enables the fitness value to be optimal as an executor network learning parameter;

step 5, restraining the updating of the strategy by a cut-off method, calculating a loss function of an executor-evaluator network and updating the strategy;

and 6, ending the current round when the completion mark is true, starting the next round, re-inputting the edge car networking environment based on digital twinning, and repeating the steps 2-5.

Further, the input digital twin vehicle edge network environment in step 1, wherein the vehicle edge network environment comprises:

(1) Time slot model: discretizing the continuous training time into a plurality of time slots, using a positive integer T e {1,2,., T } to represent the T-th time slot; assuming that the vehicle user completes one edge buffer decision and position movement in a single time slot, and the environment states such as transmitting power, channel noise and the like in the single time slot are not changed, and the vehicle user is called one round when all the vehicle users reach the set road end point;

(2) Network model: establishing a double-layer number consisting of a physical entity layer and a digital twin layerWord twinning car networking edge network model. Suppose that the physical layer includes N vehicle users and one base station equipped with an MEC server. The set of vehicle users is denoted v= { v ₁ ,v ₂ ,…v _N Digital twin representation of vehicle n in time slot tWherein-> Is an estimate of the actual value of the task calculation frequency of DT for t-slot vehicle n, +.>Is the error between the actual value of the task calculation frequency of the slot t vehicle n and the estimated value of DT. The digital twins of all vehicles, base stations and MEC servers constitute a digital twinning layer.

(3) Communication model: assuming unidirectional traffic, the vehicle can communicate with the base station through a wireless connection, or can communicate with the cloud server and the digital twin network through the base station along the road. The road is assumed to be an open area without barriers, the influence of environmental factors on the path loss is not considered, and a simple free space model is adopted as the path loss model. The channel gain between the vehicle and the base station can be expressed as

Let beta _n ～e ^μ Is the rayleigh fading of the channel between the vehicle user n and the n base station, where μ is the corresponding scaling parameter. Let lambda be the path loss factor and,indicating the distance of the vehicle n between the t slot and the base station.

The information transmission process can be subjected to various interferences, and Gaussian white noise can be used as a general noise model to describe noise interference in a channel. The information transmission rate between the t-slot vehicle n and the base station can be expressed as

Wherein i is the vehicle number, B _n Indicating the available bandwidth allocated to vehicle n by the base station,representing the signal transmission power, sigma, of a vehicle n between a t time slot and a base station ² Is the power of gaussian white noise.

(4) And (3) calculating a model: by two sets of elementsDescribing the task of a t-slot vehicle n, wherein +.>Indicating the size of the task data amount sent by the t-slot vehicle n,/->The number of calculation cycles required to calculate the task data of the unit bit t slot vehicle n is represented. And a time delay error exists when the real-time data interaction is carried out between the digital twin body and the physical entity. the error of the true calculated delay of the t time slot and the digital twin estimated delay can be expressed as

the n-task calculation time of the t-slot vehicle is as follows

the n-task transmission time of the t-slot vehicle is

the n-task transmission energy consumption of the t-time slot vehicle is

Calculation energy consumption when n tasks of t-slot vehicle are unloaded to local calculation

Calculating energy consumption as when n tasks of t-slot vehicle are unloaded to MEC server

The cost of renting cloud computing cannot be ignored, assuming that the unit cost of renting cloud computing isDollars, t-slot vehicle n-mission, cost to cloud computing platform

the n-task cloud computing cost of the t-slot vehicle is as follows

Where μ is a price factor related to the cloud service provider.

The total time cost, energy cost and cloud computing cost for the computing task offloading of t-slot vehicle n are

Further, in step 2, in the current time slot, each vehicle user selects an unloading scheme according to the action generated in the previous time slot, obtains the rewarding vector and the completion flag of the environmental feedback, and stores the rewarding vector and the completion flag in the buffer area, specifically:

(1) Action of vehicle user

Motion vector for vehicle user selection at time slot tCan be expressed as

a(t)＝[a ₁ (t),a ₂ (t),…,a _N (t)]

Action a of vehicle n in time slot t _n (t) a value of 0 indicates that the vehicle is connected to a local calculation, a value of 1 indicates that the vehicle is connected to a MEC calculation, and a value of 2 indicates that the vehicle is connected to a cloud calculation.

(2) System rewards

The digital twin-based internet-of-vehicles edge computing and unloading multi-objective decision method shares two optimization objectives, minimizes the vehicle user computing task unloading time delay and minimizes the energy consumption and cloud computing cost. Reducing the prize value using the parameter alpha when the cost value is large, and setting t time slots as the system prize for the target of minimizing the time delay

For minimizing energy consumption and cloud computing costs, the system rewards are set to

The system rewards of the two targets are combined to obtain the rewards vector of the environmental feedback

(3) Completion sign

Assuming that the vehicle is traveling at a constant speed in one direction, the vehicle user performs one vehicle movement and location update in one time slot. If all the vehicle users all reach the end point of the specified path from the respective start points, 1 round is ended and the returned completion flag done is true, otherwise done is false.

Further, in step 3, a mean square error regression fit value function is used according to the buffer data, and the advantage estimation is calculated based on the current cost function and the discount return, specifically:

fitting a value function using a mean square error regression, the value function being

Let V _φ (s _t ) As a current cost function in the markov decision process,a reward vector for time slot t, gamma is a discount factor, s _t For the state at time slot t, the discount return is expressed as

The dominance estimation function is expressed as

Further, in step 4, the solution set and fitness value are updated by using the decomposition-based multi-objective evolutionary algorithm and the reward vector, and the pareto optimal solution set for optimizing the fitness value is returned as the actor network learning parameter, which specifically includes:

the initial fitness vector F of the decomposition-based multi-objective evolutionary algorithm is a reward vectorThe Chebyshev method can be used to determine fitness values that enable the evaluation of the overall performance

f＝max(ω _i *|F[i]-z _i |)

Wherein omega _i Weights representing the ith objective function, z _i Representing the reference point of the ith objective function. Comparing the fitness value of each solution to judge paretoAnd (3) optimally returning a solution set which optimizes the fitness value as an Actor network learning parameter.

Further, in step 5, the policy is constrained by a truncation method, and the loss function of the actor-evaluator network is calculated and the policy is updated, specifically:

the near-end policy optimization algorithm uses the Clip truncation method to limit the magnitude of policy updates when updating. At each update the Clip algorithm will calculate the ratio between the new policy and the old policy and then limit this ratio to a range controlled by the super parameter epsilon. The loss function can thus be defined as

Wherein the method comprises the steps ofclip () is a truncated function.

If it isThe return generated by the current action is greater than the expected return of the reference action, so that the updating strategy increases the probability of the action, and the probability is not higher than 1+epsilon times of the original strategy; otherwise, if->The probability of occurrence of the action is reduced, which is not lower than 1-epsilon times that of the original strategy. Finally by maximizing the loss function L ^CLIP (θ) update policy.

Compared with the prior art, the invention has the remarkable advantages that:

(1) The digital twin technology and the mobile edge computing technology are applied, the real-time performance and the accuracy of data are improved, the data computing efficiency and the intelligent level of the Internet of vehicles are improved, and the high-time-delay and high-energy consumption problem of the Internet of vehicles is relieved. (2) The method is characterized in that a digital twin-based internet-of-vehicles edge calculation unloading multi-objective decision method is adopted, and the stability and convergence speed of an algorithm are greatly improved by limiting the updating amplitude of a strategy by using a truncation method. (3) By adopting the digital twin-based internet-of-vehicles edge computing unloading multi-objective decision method, the multi-objective optimization problem can be effectively solved, the long-term task unloading cost is minimized in the trade-off among multiple objectives such as time delay, energy consumption, cloud computing cost and the like, and the intelligent task unloading with low cost and high efficiency is realized.

Drawings

FIG. 1 is a flow chart of the digital twinning-based internet of vehicles edge computing offload multi-objective decision method of the present invention.

Fig. 2 is a schematic diagram of a vehicle edge network topology based on digital twinning according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of learning convergence effects of different schemes in an embodiment of the present invention.

Fig. 4 is a graph showing loss function loss versus time for different schemes in an embodiment of the present invention.

FIG. 5 is a graph showing the comparison of prize values for different tasks according to various embodiments of the present invention.

Fig. 6 is a graph showing the comparison of prize values for different vehicle users according to the embodiment of the present invention.

Fig. 7 is a graph showing the comparison of prize values for different MEC numbers for different schemes in an embodiment of the invention.

Detailed Description

The technical solution for realizing the purpose of the invention is as follows: the digital twinning-based internet-of-vehicles edge computing and unloading multi-objective decision method comprises the following steps in combination with figures 1-2:

(2) Network model: and establishing a double-layer digital twin-vehicle networking edge network model consisting of a physical entity layer and a digital twin layer. Suppose that the physical layer includes N vehicle users and one base station equipped with an MEC server. The set of vehicle users is represented asDigital twin representation of vehicle n in time slot t +.>Wherein-> Is an estimate of the actual value of the task calculation frequency of DT for t-slot vehicle n, +.>Is the error between the actual value of the task calculation frequency of the slot t vehicle n and the estimated value of DT. The digital twins of all vehicles, base stations and MEC servers constitute a digital twinning layer.

the n-task calculation time of the t-slot vehicle is as follows

the n-task transmission time of the t-slot vehicle is

the n-task transmission energy consumption of the t-time slot vehicle is

the n-task cloud computing cost of the t-slot vehicle is as follows

Where μ is a price factor related to the cloud service provider.

(1) Action of vehicle user

Motion vector for vehicle user selection at time slot tCan be expressed as

a(t)＝[a ₁ (t),a ₂ (t),…,a _N (t)]

(2) System rewards

(3) Completion sign

The dominance estimation function is expressed as

f＝max(ω _i *|F[i]-z _i |)

Wherein omega _i Weights representing the ith objective function, z _i Representing the reference point of the ith objective function. And comparing the fitness value of each solution, judging that the pareto is optimal, and returning a solution set with the optimal fitness value as an Actor network learning parameter.

Wherein the method comprises the steps ofclip () is a truncated function.

The invention will be described in further detail with reference to the accompanying drawings and specific examples.

Examples

One embodiment of the invention is described in detail below, with simulation using python programming, and parameter settings do not affect generality. The method compared with the method comprises the following steps: (1) The digital twin-assisted internet of vehicles edge computing task unloading decision-making method based on single-target near-end strategy optimization; (2) The internet of vehicles edge computing task unloading decision method based on multi-target near-end strategy optimization (3) and the internet of vehicles edge computing task unloading decision method based on single-target near-end strategy optimization (4) are based on an executor-evaluator algorithm.

The vehicle edge network model is shown in fig. 2. Assuming a total of 12 vehicle users, 1 MEC server, the base station covers the full road range, the road is 1 km long, each 10 m interval is used as a starting point of 1 vehicle, and the vehicle runs at a constant speed of 50km/h in one direction. The data size, the calculation frequency and the calculation density of the vehicle task are all random values within a set range. The main simulation parameters are shown in table 1.

TABLE 1 Main simulation parameters

As shown in fig. 3, compared with each comparison scheme, the digital twin-based internet-of-vehicles edge computing and unloading multi-objective decision method has the advantages of highest convergence speed, highest rewarding value, best stability and optimal performance after convergence, and the method is verified to be capable of minimizing long-term task unloading cost in the trade-off among multiple objectives such as time delay, energy consumption, cloud computing cost and the like, and achieving low-cost and high-efficiency intelligent task unloading.

Compared with the method, the method for deciding the unloading of the vehicle networking edge computing task based on the executor-evaluator algorithm has slow convergence, and the converged rewards have the minimum value, because the method is difficult to process a high-dimension state space and action space. The method for unloading the computing task of the edge of the vehicle networking based on the optimization of the single-target near-end strategy adopts a clip method to limit the size of each strategy update, so that the problem of overlarge strategy update is avoided, the algorithm converges faster and more stably, but the performance of the method is inferior to that of the computing task of the edge of the vehicle networking based on the optimization of the multi-target near-end strategy because the multi-target optimization is not considered. One advantage of digital twinning is that it can help reduce errors in long-range information transmission. The method based on digital twinning is slightly superior to the method without digital twinning.

As shown in FIG. 4, the loss function loss value of the digital twin-based vehicle networking edge calculation unloading multi-objective decision method is smaller than that of other methods, the fluctuation of a loss curve after convergence is smaller, and the method is more stable and better in performance. The method is characterized in that the digital twin-based internet-of-vehicles edge computing unloading multi-objective decision method combines the advantages of a decomposition-based multi-objective evolutionary algorithm and a near-end strategy optimization, is more efficient and stable, and can better realize multi-objective collaborative optimization.

As shown in fig. 5-7, as the task data size, the number of vehicle users, and the number of MECs increase, the average convergence rewards for each scheme gradually decrease, due to the increased complexity of algorithm learning and reduced algorithm performance. The method has the advantages that the method changes various parameters, the descending amplitude of the method for calculating and unloading the multi-target decision based on the digital twin internet of vehicles is minimum, the method has the highest rewards, and the superiority of the method for calculating and unloading the multi-target decision based on the digital twin internet of vehicles under the complex condition of the vehicle network is further verified.

The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The digital twinning-based internet-of-vehicles edge computing unloading multi-objective decision method is characterized by comprising the following steps of:

2. The digital twinning-based internet of vehicles edge computing offload multi-objective decision method of claim 1, wherein the input digital twinning vehicle edge network environment of step 1, wherein the vehicle edge network environment comprises:

(2) Network model: and establishing a double-layer digital twin-vehicle networking edge network model consisting of a physical entity layer and a digital twin layer. Suppose that the physical layer includes N vehicle users and one base station equipped with an MEC server. The set of vehicle users is represented asDigital twin representation of vehicle n in time slot t +.>Wherein the method comprises the steps of Is the estimated value of the actual value of the task calculation frequency of DT for t-slot vehicle n, f' _n ^t Is the error between the actual value of the task calculation frequency of the slot t vehicle n and the estimated value of DT. The digital twins of all vehicles, base stations and MEC servers constitute a digital twinning layer.

the n-task calculation time of the t-slot vehicle is as follows

the n-task transmission time of the t-slot vehicle is

the n-task transmission energy consumption of the t-time slot vehicle is

the n-task cloud computing cost of the t-slot vehicle is as follows

Where μ is a price factor related to the cloud service provider.

3. The method for computing and offloading multi-objective decisions based on digital twin internet of vehicles according to claim 2, wherein in step 2, in the current time slot, each vehicle user selects an offloading scheme according to the action generated in the previous time slot, obtains a reward vector and a completion flag of environmental feedback, and stores the reward vector and the completion flag in a buffer, specifically:

(1) Action of vehicle user

Motion vector for vehicle user selection at time slot tCan be expressed as

a(t)＝[a ₁ (t),a ₂ (t),…,a _N (t)]

Action a of vehicle n in time slot t _n (t) a value of 0 indicates that the vehicle is connected to a local calculation, and a value of 1Indicating that the vehicle is connected to the MEC calculation, a value of 2 indicates that the vehicle is connected to the cloud calculation.

(2) System rewards

(3) Completion sign

4. The method for computing and offloading multi-objective decisions based on digital twin internet of vehicles edges according to claim 3, wherein in step 3, a mean square error regression fit value function is used according to the data of the buffer zone, and the advantage estimation is computed based on the current cost function and the discount return, specifically:

The dominance estimation function is expressed as

5. The method for computing and offloading multi-objective decisions based on digital twin internet of vehicles according to claim 4, wherein in step 4, the solution set and fitness value are updated by using a multi-objective evolutionary algorithm based on decomposition and a reward vector, and the pareto optimal solution set for optimizing the fitness value is returned as the actor network learning parameter, specifically:

f＝max(ω _i *|F[i]-z _i |)

6. The method for computing and offloading multi-objective decisions based on digital twin internet of vehicles according to claim 5, wherein the updating of the constraint policy by the truncated method in step 5 computes the loss function of the actor-evaluator network and updates the policy, specifically:

Wherein the method comprises the steps ofclip () is a truncated function.