CN116321298A

CN116321298A - Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles

Info

Publication number: CN116321298A
Application number: CN202310318141.7A
Authority: CN
Inventors: 马强; 何杰; 邢玲; 高建平; 吴红海
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-23

Abstract

The invention discloses a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles. The strategy comprehensively considers factors such as task types, data flow, computing resources, geographical environment and the like, takes a plurality of indexes such as task timeout rate, vehicle energy consumption, resource lease cost, server load balancing coefficient and the like as optimization targets, and takes the improvement of the total benefit of the system as a final aim. The strategy builds a system decision model based on an improved TD3 deep reinforcement learning algorithm, and builds a multi-task and multi-target local unloading environment based on clusters. In order to improve the decision accuracy, a double-noise strategy network is proposed to improve the exploration rate of the intelligent agent; it is proposed to use a hybrid motion space to improve the adaptability of the algorithm. Aiming at the problems of single optimization target and poor environmental adaptability of the conventional strategy, the invention can adaptively complete the calculation task by analyzing the characteristics of the task, the vehicle and the edge calculation environment, effectively reduce the task unloading cost of the Internet of vehicles and improve the resource utilization rate.

Description

Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles

Technical Field

The invention belongs to the field of edge calculation of the Internet of vehicles, and particularly relates to a multi-target joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles.

Background

The internet of vehicles edge computing is a technology for placing computing and storage resources on edge nodes close to vehicles, and aims to reduce data transmission delay and network bandwidth consumption, so that the real-time performance and reliability of the internet of vehicles system are improved. The task unloading is one of key technologies of edge computing, and can be used for handing vehicle-mounted computing tasks to edge nodes for processing, so that vehicle-mounted computing burden is effectively reduced, and vehicle-mounted computing performance and efficiency are improved. The research purpose of the unloading of the edge computing task of the Internet of vehicles is to improve the system performance and efficiency of the Internet of vehicles by solving the problems of task scheduling, resource management, load balancing and the like. However, the problems of insufficient resources, changeable environment, complex decision and the like exist in the prior art, so that the problems of high time delay, high energy consumption, low resource utilization rate and the like of task unloading are caused. Therefore, to facilitate research and application of vehicle edge calculation and task offloading, we need to enhance theoretical exploration and practical innovation, continuously improve and perfect related technologies and algorithms to enhance performance and benefits of the internet of vehicles system.

Related researches are currently conducted on task offloading strategies in the field of internet of vehicles edge computing, but most research results are focused on a single optimization objective or a single user. Liu Guozhi et al [ Liu Guozhi, instead of flying, mo Qi, etc. ] computer integrated manufacturing system 2022,28 (10): 12 ] proposes an end-to-side-cloud collaborative service offloading architecture in a vehicle edge computing environment, and employs a Deep Q Network (DQN) based task offloading method to minimize average service delay as an optimization objective, effectively reducing task processing delay under edge server computing and communication resource constraints. However, the method has a single optimization objective and does not consider other factors that may affect the task offloading decision.

The xiaolone Xu et al Xu X, huang Q, zhu H, et al security Service Offloading for Internet of Vehicles in SDN-Enabled Mobile Edge Computing J IEEE Transactions on Intelligent Transportation Systems,2021,22 (6): 3720-3729 et al, presents a security service offloading framework based on SDN (Software Defined Network) and mobile edge computing techniques, but the framework emphasis is on the communication security and co-operative efficiency issues of interconnected vehicles, and furthermore the security issues of the framework require more detailed investigation.

Hansong Wang et al [ Wang H, li X, ji H, etc. fed Offloading Scheme to Minimize Latency in MEC-Enabled Vehicular Networks [ A ].2018IEEE Globecom Workshops (GC Wkshps) [ C ]. Abu Dhabi, united Arab Emirates:IEEE,2018:1-6 ] propose a joint offloading scheme to minimize overall latency. The tasks are divided into three parts of local calculation, edge calculation and adjacent vehicle calculation, and the task allocation proportion of three parties is considered to realize the shortest delay for completing the whole task. The scheme can effectively improve the utilization rate of the computing resources and reduce task delay. But also has the problems of single optimization target, poor environmental adaptability and the like.

While the above-described related work addresses some of the key issues of task offloading in internet-of-vehicles edge computing to some extent, it can be seen that there are some shortcomings. First, most of these methods have single optimization targets, do not comprehensively consider a plurality of factors affecting task unloading from the system level, cannot effectively and comprehensively improve the overall benefit of the system, and may have difficulty in practical application. Second, these methods do not take into account that the nature of the task and the user's needs may change continuously, and these methods may not be able to quickly adapt to these changes. Therefore, in the research of the task unloading strategy of the internet of vehicles, on one hand, not only the user demands and the service quality are required to be considered, but also various factors influencing the overall benefit of the system are required to be considered; on the other hand, the algorithm itself also needs to have higher environmental adaptability, and can quickly adapt to the continuously changing Internet of vehicles environment. The invention starts from the two aspects, and establishes a task unloading strategy.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides a multi-target joint optimization task unloading strategy based on deep reinforcement learning in an Internet of vehicles environment. The strategy can effectively improve the completion rate of the task under the specified time delay requirement, reduce the task unloading cost and improve the utilization rate of system resources.

In order to achieve the above purpose, the multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the internet of vehicles of the invention comprises the following steps:

s1.1, constructing a vehicle edge computing task unloading system model based on clusters:

aggregation of Internet of vehicles components in a certain area into cluster C _s S epsilon {1, 2..s }, each cluster is independent of the other, each cluster can be regarded as an independent individual, and is mainly composed of the following four parts: (1) Main control base station integrating SDN

I.e. the control center; (2) M base stations providing edge computing services +.>

Each base station is bound with an edge server; (3) N (N) _s Each is at C _s Task vehicle inside->

(4) Cloud server C ^server The control center performs task unloading and resource scheduling control through a task unloading instruction, wherein the task unloading instruction is defined as:

the elements in the set respectively represent task unloading nodes, task unloading proportion, resource lease proportion and signal transmitting power;

the control center determines a task unloading instruction according to decision factors, wherein the decision factors comprise task time delay, vehicle energy consumption, resource lease cost and server load penalty value (load), and weight coefficients of all parts are set as follows in sequence: (lambda) ₁ ,λ ₂ ,λ ₃ ,λ ₄ ) The conditions are satisfied:

s1.2, building a basic model of each component:

s1.2.1, establishing a basic model of the task vehicle:

v _i ＝{i,p _i ,f _i ,k _i }

wherein i is the serial number of the task vehicle; p is p _i Is the relative position of the vehicle; f (f) _i For calculating frequency, k of vehicle _i Calculating power for the vehicle-mounted OBU;

s1.2.2 a task model of the task vehicle:

where J is the task level, J e {1, 2..J }, total J levels, in _j Inputting data quantity for a task; cal (cal) _j Calculating the amount for the task;

for the maximum expected time delay of the task, the higher the task level is, the higher the time delay requirement is, and the higher the corresponding priority is when the task is unloaded;

s1.2.3 builds a basic model of the serving base station:

b _m ＝{m,f _m ,p _m ,l _m }

wherein M is the base station number, M e (1, 2.. Multidot.m); f (f) _m Calculating the frequency for the server, p _m To calculate the unit price, the value of the unit price is positively correlated with the real-time calculation frequency of the task, l _m The load factor is the base station;

s1.2.4 builds a basic model of the cloud server:

c _c ＝{f _c ,p _c ,t _c }

wherein f _c To calculate the frequency, p _c To calculate the unit price, the data transmission delay t _c Satisfy Gaussian distribution

S1.3, establishing a task unloading model:

based on 5G cellular communication, a base station increases data transmission capability by using a Massive antenna array (Massive MIMO) technology, and models Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication by combining a Non-orthogonal frequency division multiple access (NOMA) technology. V2V communication is carried out between vehicles through a PC5 interface; V2I communication is carried out between the vehicle and the base station through a Uu interface; the base station, the control center and the cloud server are communicated with each other mainly through optical fibers; the base stations perform bidirectional interaction through a control center; task offloading two modes, one is total offloading; the second is to unload part, unload part according to task separability according to certain division rule unload part calculate task to the goal server to carry out; the target servers are a plurality of edge servers with higher performance and high-performance cloud servers with higher transmission delay;

the data transmission of the Internet of vehicles is carried out through the base station, and the base station is allocated with certain bandwidth resources which can be allocated to different vehicles for data transmission; to describe the fading characteristics of a multipath channel, the communication channel is modeled as a rayleigh channel; the allocation of the bandwidth resources by the base station needs to consider a plurality of factors, such as the communication requirement of the vehicle, the network congestion degree, the load of the base station and the like; the data transmission rate is defined as:

wherein the method comprises the steps of

Indicating base station b _m Assigned to vehicle v _i Bandwidth ρ of _i Signal transmission power, delta, representing vehicle _i Signal gain, sigma, representing the environment in which the vehicle is located _m Representing standard deviation of white gaussian noise in the environment;

task offloading can be divided into local computing, edge computing and cloud computing according to different offloading modes and computing nodes;

s1.3.1 a local calculation model is built:

the time delay of local calculation is mainly task decision time delay and calculation time delay, and the task decision time delay t _dec Satisfy Gaussian distribution

Calculating time delay:

and (3) locally calculating energy consumption:

defining a local calculation cost:

s1.3.2 an edge calculation model is built:

the time delay of the edge calculation is mainly task decision time delay, task uploading time delay, task transfer time delay and task execution time delay, and the feedback time delay of the calculation result is ignored;

edge calculation energy consumption:

edge computing rental fee:

edge calculation load penalty:

defining edge computation cost:

wherein L represents the length of the lane; t is t _loss Representing the transmission loss rate;

representing task t _j Is the actual calculation rate of (a), edge server b _m Calculation rate and load rate l of (2) _m Related to; a, b, c are coefficients of a binary linear function with respect to the load penalty value;

s1.3.3 an edge calculation model is built:

the time delay of cloud computing is mainly task decision time delay, task uploading time delay, task transmission time delay and task computing time delay, and the result feedback time delay is ignored;

task calculation time delay:

cloud computing rental fees:

defining cloud computing costs:

s1.4, establishing a multi-objective joint optimization model:

the system cost, namely the optimization target, is defined as the weighted sum of task delay, vehicle energy consumption, resource lease cost and server load penalty value;

wherein lambda is _i Satisfy 0 < lambda _i ＜1,

The symbol (x) represents the normalized prize value;

the task offloading policy is defined as: solving a set of service strategies that minimizes average system cost, i.e., maximizes rewards or system benefits, over a long period of time, thus the problem can be modeled as:

where n (τ) represents the number of mission vehicles at τ slots; d (D) _i Representing vehicle v _i Task processing energy consumption of (a); c (C) _i Representing vehicle v _i Task processing energy consumption of (a); f (F) _i Representing vehicle v _i Resource lease costs of (2); l (L) _i Is shown in the process of handling vehicle v _i Server load balancing coefficients during the task; (a) The value range of each bonus weight coefficient is 0,1]And the sum is 1; (b) The value range of each sub-action is [0,1 ] for any action]The method comprises the steps of carrying out a first treatment on the surface of the (c) Indicating that the ratio of task offloading and the ratio of resources allocated by the server m for any vehicle at any time slot is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the (d) Indicating that the sum of the proportion of resources allocated by the server m in any time slot is less than or equal to 1; (e) When the task is edge calculation, the task unloading proportion and the resource lease proportion are not equal to 0; (f) When the task is cloud computing, the task unloading proportion is 1;

s1.5 task unloading decision agent training and evaluation:

the invention adopts a depth reinforcement learning algorithm based on an improved dual-delay depth deterministic strategy gradient algorithm (Twin Delayed Deep Deterministic Policy Gradient, TD 3) to carry out task unloading and resource scheduling decision; TD3 is a deep reinforcement learning algorithm for continuous control tasks, which has six networks in total, including two value networks q (s, a; w) _i ) I=1, 2 and one policy network u (s; θ) and corresponds to one target network respectively; randomly initializing value network parameters w _i And policy network parameters theta, and respectively assigning to the target network

And theta ^- ；

S1.5.1 environmental input state preprocessing:

the environmental state is represented by vectors

Indicating (I)>

The system is a multidimensional vector, and consists of calculation frequencies and load coefficients of a plurality of edge servers, vehicle positions, calculation frequencies and calculation power of a vehicle, the input data size of tasks, required calculation amount and maximum time delay:

since the sizes of the elements in the vectors are different and the orders of magnitude are also different, the state normalization is needed, and the specific method is implemented for each vector element:

wherein I represents an environmental state vector

Dimension of->

And->

Respectively representing an upper limit value and a lower limit value of the vector element;

s1.5.2 a hybrid motion space:

the neural network outputs a vector

Indicating (I)>

The system is a 4-dimensional vector, represents task offloading decision, and respectively represents offloading node, offloading proportion, resource lease proportion, signal transmitting power, neural network output takes tanh as an activation function, and then the neural network output is converted into decision action of an agent:

wherein the method comprises the steps of

For neural network original output actions, < >>

Acts when the intelligent agent actually interacts with the environment; for a pair of

And (3) further processing: n (N) ^num Unloading the number of nodes for the available tasks in the environment, wherein round (x, 0) represents rounding x to an integer, so that the action space is converted from the continuous action space to the mixed action space through continuous action discretization, and the application range of TD3 is expanded; p (P) ^min And P ^max Respectively representing a lower limit value and an upper limit value of the vehicle communication power;

s1.5.3 prize value normalization:

the rewarding value consists of task time delay, vehicle energy consumption, resource leasing fee and load rewarding of the server, each sub rewarding value is normalized, and then the final rewarding value is obtained by calculating and summing by a scaling factor:

wherein r (i) ^max Upper limit value of sub-prize value, w _i Scaling factors for the child prize values;

s1.5.4 a dual noise strategy network is introduced to obtain the experience trace of the agent:

the interaction process of the agent and the environment can be divided into two stages of exploration and utilization: in the exploration phase, the agent explores unknown states and actions in the environment by taking unknown, random strategies; in order to increase the exploration rate, the invention introduces a double-noise interference strategy, namely the output noise of a strategy network; secondly, environmental noise when the intelligent body interacts with the environment;

wherein u is _a ，

Mean and standard deviation of policy network noise are respectively represented, u _env ，

Respectively representing the mean value and standard deviation of the environmental noise;

combining strategy noise and environment interaction by the intelligent agent, accumulating track data meeting a specified threshold, recording the state, action, rewards and next state of each track by the intelligent agent, and integrating all track data into an experience pool for training;

s1.5.5 decision agent training:

splitting the experience pool obtained by S1.5.4 into small batches for training a TD3 model and training a strategy network: updating the parameter theta to maximize the value network evaluation value Q, and calculating the gradient through a chain rule:

updating the parameter θ by gradient ascent:

θ ^new ←θ ^now +β·g

particularly, alpha and beta are learning rates, gamma is discount rate, and the learning rates and the gamma are super parameters which need to be manually adjusted; the new mark represents the updated parameters of the network;

updating value network parameters; the TD error is the difference between the predicted value and the TD target:

defining a loss function as the mean square error of the predicted value and the smaller TD target in the two target value networks:

updating the value network parameters by adopting gradient descent;

policy delay update: after the predicted network is updated for h rounds, the parameters of the target network are updated,

phi is the weight ratio of the new parameter to the old parameter:

testing the network training effect once every time the parameters are updated, and then repeatedly executing S1.5.4 and S1.5.5 until a preset exploration step number threshold value is reached;

after the network training is finished, designing performance evaluation indexes to verify the effectiveness of the algorithm;

s1.5.6 decision agent evaluation:

and (3) evaluating the agent obtained by S1.5.5 through a strategy performance index, wherein the performance index has the following calculation formula:

(1) Average timeout rate:

(2) Average energy consumption:

(3) Average cost:

(4) Average load balancing coefficient:

wherein t is _out (τ) represents the number of tasks for which slot τ times out; n (τ) represents the number of slot τ mission vehicles; t (T) _end Representing the total number of task time slots; a (i) ∈ {0,1}, representing local computation when a (i) =0, and unloading computation when a (i) =1;

representing the load average value of the server when a certain task is unloaded;

in order to visually display the importance duty ratio of each performance index in a task unloading process and the overall system performance, the invention designs a system benefit function related to each time slot

The calculation formula is as follows:

wherein D is _i Representing vehicle v _i Is a task processing delay; c (C) _i Representing vehicle v _i Task processing energy consumption of (a); f (F) _i Representing vehicle v _i Task processing costs of (a); l (L) _i Is shown in the process of handling vehicle v _i The server load balancing coefficient when the task is performed, the system benefit function is a negative number, and the larger the value is, the better the system performance is.

Aiming at the problems of task unloading and resource scheduling of the Internet of vehicles, the invention provides a multi-target joint optimization task unloading strategy based on deep reinforcement learning. The invention designs a multi-objective joint optimization strategy with respect to task time delay, vehicle energy consumption, resource lease fees and server load balancing coefficients aiming at low time delay, low energy consumption, low cost and high resource utilization rate. The control center firstly collects environment information (such as available computing resources and load information of MEC) and vehicle task request information (such as vehicle and task information), then makes task unloading and resource scheduling decisions by the SDN controller, and along with the continuous update of task vehicle feedback data, the control center continuously improves a decision network and rapidly adapts to the continuously-changing edge computing environment.

The invention builds a system decision model based on an improved TD3 deep reinforcement learning algorithm, and builds a multi-task and multi-target local unloading environment based on clusters. In order to improve the decision accuracy, a double-noise strategy network is proposed to improve the exploration rate of the intelligent agent; it is proposed to use a hybrid motion space to improve the adaptability of the algorithm. In addition, the method is not only suitable for the Internet of vehicles, but also suitable for the Internet of things equipment with the edge computing requirement, and has wide application prospect. The method can obviously improve the completion rate of the task under the specified time delay requirement, effectively reduce the task unloading cost and improve the utilization rate of system resources, thereby improving the user satisfaction and the system benefit.

Drawings

FIG. 1 is a simplified flow diagram of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles of the present invention;

FIG. 2 is a schematic diagram of a system model of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles;

FIG. 3 is a schematic diagram of an unloading flow of a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles;

FIG. 4 is an unloading flow chart of a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles;

FIG. 5 is a comparison diagram of different scheme training of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles;

FIG. 6 is a time-out rate versus histogram of a multi-objective joint optimization task offloading strategy and comparison scheme based on deep reinforcement learning in the Internet of vehicles of the present invention;

FIG. 7 is a graph of average system benefit versus histogram of a multi-objective joint optimization task offloading strategy and comparison scheme based on deep reinforcement learning in the Internet of vehicles of the present invention;

Detailed Description

In order to better illustrate the technical effects of the invention, the invention is simulated and verified by adopting a specific example. While specific embodiments of the invention are described in conjunction with the drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples:

taking a urban road with a length of 600 meters as an example, 3 equally-spaced 5G cellular base stations are distributed on the roadside, each base station is provided with a server with different computing performances, the coverage radius of each base station is 100 meters, the spacing between the base stations is 200 meters, and the specific implementation steps of the multi-optimization task unloading strategy in the vehicle edge computing environment comprise:

s101 decision agent training

The task unloading decision-making agent is trained based on the method. The decision network is designed as a 5-layer fully connected network, comprising 3 hidden layers, 1 input layer and 1 output layer. The number of neurons in each layer is specifically (12X 256X 4),

data for network training originates from agent interactions with the environment, each data containing 4 values

Wherein->

The current state of the environment is represented, and the current state consists of the calculation frequency and the load coefficient of 3 edge servers, the vehicle position, the calculation frequency and the calculation power of the vehicle, the input data size of tasks, the required calculation amount and the maximum time delay;

The method comprises the steps of representing actions executed by an intelligent agent, and forming four parts of a task unloading node, a task unloading proportion, a resource leasing proportion and signal transmitting power; r represents rewards, and the agent is in status +.>

Execution of action down->

After this, r can be derived from the environment and enter the next state +.>

The intelligent agent judges whether the decision is good or bad through r, and awardsThe incentive is defined as the opposite of the offload cost, the smaller the cost, the greater the incentive:

the unloading cost consists of task time, vehicle energy consumption, resource lease cost and server load penalty value;

and dividing a continuous period of time into a plurality of discrete time nodes to obtain a system benefit function, and determining an optimal task unloading strategy according to the value and the distribution interval of the benefit function.

The present example trains decision-making agents based on an improved TD3 algorithm, which has six neural networks in total, comprising two value networks q (s, a; w) _i ) I e {1,2} and a policy network u (s; θ) and randomly initializing value network parameters w corresponding to one target network respectively _i And policy network parameter θ, and assign to the target network. Combining strategy noise and environment interaction by the agent, accumulating track data meeting a specified threshold value, putting the track data into an experience pool, and then extracting data to start strategy network training: and updating the parameter theta so that the value network evaluation value Q is maximum. Gradient was calculated by the chain law:

updating the parameter θ by gradient ascent:

θ ^new ←θ ^now +β·g

updating the value network parameters by adopting gradient descent;

phi is the weight ratio of the new parameter to the old parameter:

and testing the network training effect once every time the parameters are updated. The agent continuously explores the environment, thereby continuously updating the experience pool, and continuously updating the network parameters until reaching the preset exploration threshold. After training, the control center deploys decision-making agent.

S102: task offloading request issue:

task vehicle v _i The vehicle information and the task basic information I { v, t } are transmitted to the control center through the nearest base station. The vehicle information is expressed as: v _i ＝{i,p _i ,f _i ,k _i The task basic information is expressed as:

s103: the control center collects information and makes decisions:

when the control center receives the unloading request, firstly, the key information in the request is extracted, and the states of the edge servers in the current time slot are integrated to form a state information vector together

Then inputting the motion vector into decision network to obtain a motion vector

The action vector is arranged to obtain an unloading scheme and a resource scheduling scheme of the task, and the unloading scheme and the resource scheduling scheme are respectively sent to the task request node and the service node.

S104: uploading data to complete the calculation task:

the task request node sends the complete information of the task to a designated service node according to an unloading scheme; and the service node executes the calculation task according to the resource scheduling scheme after receiving the task information, and returns the calculation result to the task vehicle.

S105: vehicle feedback information:

after the task is completed, the task vehicle feeds back the unloading result to the control center, and the control center integrates the feedback with the previous task request information and stores the integrated feedback and the previous task request information into a local database so as to update the decision network for use.

To verify the effectiveness and practicality of the present invention, we devised several comparative schemes. The specific scheme is as follows:

scheme 1: task random computation (Randomized Computing). The random calculation of the task means that the values of the task unloading node, the unloading proportion, the calculated lease proportion and the vehicle transmitting power are determined in a random mode in the task unloading process.

Scheme 2: the task is calculated locally (Local). The task computing participants have only the task vehicle itself.

Scheme 3: the tasks are polygon computations and task Decisions (DQN) are made based on the DQN. DQN (Deep QNetwork) is a classical deep reinforcement learning algorithm which can be used to solve reinforcement learning problems with discrete motion spaces and is widely used in various reinforcement learning scenarios including electronic games, robot control, etc.

Scheme 4: tasks are polygonal computations and task decisions (SACs) are made based on SACs. SAC (Soft Actor-Critic) is a deep reinforcement learning algorithm based on maximum entropy theory, and aims to solve the reinforcement learning problem in continuous motion space.

Scheme 5: the task is edge calculation and task decision making (mec_td3) based on the algorithms herein. The scheme is a variation of the scheme, and the task unloading decision is that the vehicle self resources are not utilized

The task offloading ratio is set to 1.

The method of the present invention is scheme 6. Table 1 is the relevant experimental parameters in the experiment. Different service scenes of the vehicle, such as automatic driving, online entertainment and the like, are simulated by designing different delay indexes, different calculated amounts and different input data of a task; by designing different positions and different performances of the vehicle, different performances of the edge server simulate the diversity of environments with different load rates.

Table 1 related experimental parameters

In order to better illustrate the technical effects of the present invention, the present example adopts the following 4 performance indexes, and the specific calculation expressions of the indexes are as follows:

(1) Average timeout rate:

(2) Average energy consumption:

(3) Average cost:

(4) Average load balancing coefficient:

wherein t is _out (τ) represents the number of tasks for which slot τ times out; n (τ) represents the number of slot τ tasks;

a (i) ∈ {0,1}, representing local computation when a (i) =0, and unloading computation when a (i) =1;

representing the average value of the server load when a certain task is unloaded.

The calculation formula is as follows:

wherein T is _i Representing vehicle v _i Is a task processing delay; e (E) _i Representing vehicle v _i Task processing energy consumption of (a); f (F) _i Representing vehicle v _i Task processing costs of (a); l (L) _i Is shown in the process of handling vehicle v _i The server load balancing coefficient when the task is performed, the system benefit function is a negative number, and the larger the value is, the better the system performance is.

30 runs of the experiment were performed for each protocol, and then the average of the indices for each run was calculated. Table 2 shows the experimental results of the method of the present invention and other protocols.

Table 2 experimental results

From the table we can get the following information: the method of the invention has optimal performance on a plurality of indexes such as timeout rate, energy consumption, load balance and the like, and simultaneously greatly improves the average system benefit. Further analysis of table 2 we can find that: the scheme 1 adopts a random unloading mode, can adopt local calculation or cloud calculation, reduces the use of an edge server to a certain extent, and therefore has better performance in terms of rental cost and load balancing; the scheme 2 adopts a local calculation mode, so that the energy consumption is highest, and meanwhile, the time-out rate is highest and the system benefit is lowest due to the fact that the calculation resources of the vehicle are insufficient; the scheme 3 reduces the timeout rate, but the DQN algorithm cannot process the decision problem in the continuous action space, and task unloading decisions can be obtained only by discretizing the continuous action space, however, the scheme 3 is poor in two indexes of lease cost and load balancing due to the fact that the dimension disaster problem exists and the decision level is limited, and is still superior to the scheme 1 and the scheme 2 in system benefit; the SAC algorithm in scheme 4 can handle decision problems in continuous motion space, has optimal rental cost, and is superior to scheme 3 in performance, but weaker than schemes 5 and 6. Although the overall performance is better, the performance is weaker than the method of the present invention because the resources of the vehicle itself are not utilized, while considering the limited edge computing resources.

The technical effects of the present invention are next analyzed from the following two aspects:

(1) Validity analysis

Firstly, the invention can realize multiparty cooperation and cooperative processing between the vehicle and the service node, and improve the overall efficiency and performance of the Internet of vehicles. The task unloading transfers the data processing service on the vehicle to the edge node or the cloud node for processing, and the invention realizes the optimal utilization of resources through efficient task unloading and resource allocation planning, thereby improving the overall efficiency and performance of the Internet of vehicles.

Secondly, the task unloading method effectively reduces the task unloading cost and improves the resource utilization rate through the multi-target joint optimization task unloading strategy. According to the invention, the calculation tasks are completed by fully utilizing the resources of the vehicle and the service node, so that the problem of insufficient calculation resources of the vehicle and the edge is effectively solved, the endurance time of the vehicle is prolonged, the resource lease cost is reduced, the load balancing rate of the edge server is improved, and most importantly, the risk of the vehicle is reduced and the safety of a driver is improved by reducing the task overtime rate when the vehicle processes services such as automatic driving and the like; and the different computing capacities and load levels of different servers are considered, so that the load balancing rate of the edge servers is improved, and the resource utilization rate is higher.

In conclusion, the method has higher effectiveness. Through multiparty cooperation and cooperative processing of the vehicle and the service node and the multi-target joint optimization task unloading strategy, the method and the system solve the problem of insufficient computing resources of the vehicle and the edge to a certain extent, prolong the endurance time of the vehicle, reduce the resource lease cost and improve the resource utilization rate of the edge server. Meanwhile, a large number of experimental results show that the invention has obvious effect compared with other schemes in the aspect of improving the system benefit. In addition, fig. 5, 6 and 7 demonstrate the superior performance of the present invention.

(2) Adaptive analysis

First, the present invention considers the heterogeneity between different vehicles and different servers from the system level, i.e. the hardware and software configurations of different vehicles may be different, and the computing performance and operating states of different edge servers may be different. Therefore, in the task unloading process, the task can be unloaded according to the characteristics of different vehicles and different servers, and the task can be successfully completed.

Secondly, from the application level, the invention considers different calculation requirements of the vehicle, such as different calculation requirements of the business of automatic driving, online entertainment and the like, and selects the most suitable task unloading scheme for the different calculation requirements so as to achieve the best performance and effect.

In summary, the invention can adapt to different task vehicles, different edge servers and different calculation demands, and has higher adaptability.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised in accordance with the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A multi-target joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles is characterized by comprising the following steps:

s1.2, building a basic model of each component:

s1.2.1, establishing a basic model of the task vehicle:

v _i ＝{i,p _i ,f _i ,k _i }

s1.2.2 a task model of the task vehicle:

s1.2.3 builds a basic model of the serving base station:

b _m ＝{m,f _m ,p _m ,l _m }

wherein M is the base station number, M e (1, 2.. Multidot.m); f (f) _m Computing for a serverFrequency, p _m To calculate the unit price, the value of the unit price is positively correlated with the real-time calculation frequency of the task, l _m The load factor is the base station;

s1.2.4 builds a basic model of the cloud server:

c _c ＝{f _c ,p _c ,t _c }

S1.3, establishing a task unloading model:

wherein the method comprises the steps of

s1.3.1 a local calculation model is built:

Calculating time delay:

and (3) locally calculating energy consumption:

defining a local calculation cost:

s1.3.2 an edge calculation model is built:

edge calculation energy consumption:

edge computing rental fee:

edge calculation load penalty:

defining edge computation cost:

s1.3.3 an edge calculation model is built:

task calculation time delay:

cloud computing rental fees:

defining cloud computing costs:

s1.4, establishing a multi-objective joint optimization model:

wherein lambda is _i Satisfy 0 < lambda _i ＜1,

The symbol (x) represents the normalized prize value;

where n (τ) represents the number of computation tasks at τ slots; d (D) _i Representing vehicle v _i Task processing energy consumption of (a); c (C) _i Representing vehicle v _i Task processing energy consumption of (a); f (F) _i Representing vehicle v _i Resource lease costs of (2); l (L) _i Is shown in the process of handling vehicle v _i Server load balancing coefficients during the task; (a) The value range of each bonus weight coefficient is 0,1]And the sum is 1; (b) The value range of each sub-action is [0,1 ] for any action]The method comprises the steps of carrying out a first treatment on the surface of the (c) Representing the ratio of task offloading and the resources allocated by the server m for any vehicle at any time slotThe ratio is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the (d) Indicating that the sum of the proportion of resources allocated by the server m in any time slot is less than or equal to 1; (e) When the task is edge calculation, the task unloading proportion and the resource lease proportion are not equal to 0; (f) When the task is cloud computing, the task unloading proportion is 1;

s1.5 task unloading decision agent training and evaluation:

And theta ^- ；

S1.5.1 environmental input state preprocessing:

the environmental state is represented by vectors

Indicating (I)>

where I represents the dimension of the environmental state vector s,

and->

s1.5.2 a hybrid motion space:

the neural network outputs a vector

Indicating (I)>

wherein the method comprises the steps of

For neural network original output actions, < >>

Acts when the intelligent agent actually interacts with the environment; for->

s1.5.3 prize value normalization:

TD3 belongs to an exclusive policy (Off-policy); in the exploration phase, the agent explores unknown states and actions in the environment by taking unknown, random strategies; in order to increase the exploration rate, the invention introduces a double-noise interference strategy, namely the output noise of a strategy network; secondly, environmental noise when the intelligent body interacts with the environment;

wherein u is _a ，

s1.5.5 decision agent training:

updating the parameter θ by gradient ascent:

θ ^new ←θ ^now +β·g

updating the value network parameters by adopting gradient descent;

phi is the weight ratio of the new parameter to the old parameter:

s1.5.6 decision agent evaluation:

(1) Average timeout rate:

(2) Average energy consumption:

(3) Average cost:

(4) Average load balancing coefficient:

in order to visually display the importance ratio of each performance index in one task unloading process and the overall system performance, the invention designs a system benefit function related to each time slot in a limited time

The calculation formula is as follows:

The invention has the advantages that:

firstly, the invention considers the demands of users in task unloading and resource scheduling decisions, such as task time delay, vehicle energy consumption and calculation expenditure, and also considers the load condition of the server, thereby improving the resource utilization rate by improving the load balancing rate of the server, and having remarkable economic benefit.

Secondly, the task unloading problem of the Internet of vehicles is researched by combining the characteristics of the 5G cellular network, and the high-speed, low-time-delay, high-capacity, wide-bandwidth and other characteristics of the 5G network are utilized, so that the time delay of task unloading is greatly reduced, and the satisfaction degree of users is improved.

Finally, an improved TD3 deep reinforcement learning algorithm is adopted for task unloading and resource scheduling decision: (1) The exploration rate of the intelligent agent is improved by adopting a double-noise strategy network; (2) The use of the hybrid motion space expands the application range of TD 3. The time-exchange cost is used for deep reinforcement learning, so that the decision complexity is greatly reduced, and the decision efficiency is improved. The invention has important application prospect and can play an important role in the field of Internet of vehicles.