CN116321298A - Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles - Google Patents

Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles Download PDF

Info

Publication number
CN116321298A
CN116321298A CN202310318141.7A CN202310318141A CN116321298A CN 116321298 A CN116321298 A CN 116321298A CN 202310318141 A CN202310318141 A CN 202310318141A CN 116321298 A CN116321298 A CN 116321298A
Authority
CN
China
Prior art keywords
task
vehicle
value
calculation
time delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310318141.7A
Other languages
Chinese (zh)
Inventor
马强
何杰
邢玲
高建平
吴红海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202310318141.7A priority Critical patent/CN116321298A/en
Publication of CN116321298A publication Critical patent/CN116321298A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0975Quality of Service [QoS] parameters for reducing delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0983Quality of Service [QoS] parameters for optimizing bandwidth or throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles. The strategy comprehensively considers factors such as task types, data flow, computing resources, geographical environment and the like, takes a plurality of indexes such as task timeout rate, vehicle energy consumption, resource lease cost, server load balancing coefficient and the like as optimization targets, and takes the improvement of the total benefit of the system as a final aim. The strategy builds a system decision model based on an improved TD3 deep reinforcement learning algorithm, and builds a multi-task and multi-target local unloading environment based on clusters. In order to improve the decision accuracy, a double-noise strategy network is proposed to improve the exploration rate of the intelligent agent; it is proposed to use a hybrid motion space to improve the adaptability of the algorithm. Aiming at the problems of single optimization target and poor environmental adaptability of the conventional strategy, the invention can adaptively complete the calculation task by analyzing the characteristics of the task, the vehicle and the edge calculation environment, effectively reduce the task unloading cost of the Internet of vehicles and improve the resource utilization rate.

Description

Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles
Technical Field
The invention belongs to the field of edge calculation of the Internet of vehicles, and particularly relates to a multi-target joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles.
Background
The internet of vehicles edge computing is a technology for placing computing and storage resources on edge nodes close to vehicles, and aims to reduce data transmission delay and network bandwidth consumption, so that the real-time performance and reliability of the internet of vehicles system are improved. The task unloading is one of key technologies of edge computing, and can be used for handing vehicle-mounted computing tasks to edge nodes for processing, so that vehicle-mounted computing burden is effectively reduced, and vehicle-mounted computing performance and efficiency are improved. The research purpose of the unloading of the edge computing task of the Internet of vehicles is to improve the system performance and efficiency of the Internet of vehicles by solving the problems of task scheduling, resource management, load balancing and the like. However, the problems of insufficient resources, changeable environment, complex decision and the like exist in the prior art, so that the problems of high time delay, high energy consumption, low resource utilization rate and the like of task unloading are caused. Therefore, to facilitate research and application of vehicle edge calculation and task offloading, we need to enhance theoretical exploration and practical innovation, continuously improve and perfect related technologies and algorithms to enhance performance and benefits of the internet of vehicles system.
Related researches are currently conducted on task offloading strategies in the field of internet of vehicles edge computing, but most research results are focused on a single optimization objective or a single user. Liu Guozhi et al [ Liu Guozhi, instead of flying, mo Qi, etc. ] computer integrated manufacturing system 2022,28 (10): 12 ] proposes an end-to-side-cloud collaborative service offloading architecture in a vehicle edge computing environment, and employs a Deep Q Network (DQN) based task offloading method to minimize average service delay as an optimization objective, effectively reducing task processing delay under edge server computing and communication resource constraints. However, the method has a single optimization objective and does not consider other factors that may affect the task offloading decision.
The xiaolone Xu et al Xu X, huang Q, zhu H, et al security Service Offloading for Internet of Vehicles in SDN-Enabled Mobile Edge Computing J IEEE Transactions on Intelligent Transportation Systems,2021,22 (6): 3720-3729 et al, presents a security service offloading framework based on SDN (Software Defined Network) and mobile edge computing techniques, but the framework emphasis is on the communication security and co-operative efficiency issues of interconnected vehicles, and furthermore the security issues of the framework require more detailed investigation.
Hansong Wang et al [ Wang H, li X, ji H, etc. fed Offloading Scheme to Minimize Latency in MEC-Enabled Vehicular Networks [ A ].2018IEEE Globecom Workshops (GC Wkshps) [ C ]. Abu Dhabi, united Arab Emirates:IEEE,2018:1-6 ] propose a joint offloading scheme to minimize overall latency. The tasks are divided into three parts of local calculation, edge calculation and adjacent vehicle calculation, and the task allocation proportion of three parties is considered to realize the shortest delay for completing the whole task. The scheme can effectively improve the utilization rate of the computing resources and reduce task delay. But also has the problems of single optimization target, poor environmental adaptability and the like.
While the above-described related work addresses some of the key issues of task offloading in internet-of-vehicles edge computing to some extent, it can be seen that there are some shortcomings. First, most of these methods have single optimization targets, do not comprehensively consider a plurality of factors affecting task unloading from the system level, cannot effectively and comprehensively improve the overall benefit of the system, and may have difficulty in practical application. Second, these methods do not take into account that the nature of the task and the user's needs may change continuously, and these methods may not be able to quickly adapt to these changes. Therefore, in the research of the task unloading strategy of the internet of vehicles, on one hand, not only the user demands and the service quality are required to be considered, but also various factors influencing the overall benefit of the system are required to be considered; on the other hand, the algorithm itself also needs to have higher environmental adaptability, and can quickly adapt to the continuously changing Internet of vehicles environment. The invention starts from the two aspects, and establishes a task unloading strategy.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a multi-target joint optimization task unloading strategy based on deep reinforcement learning in an Internet of vehicles environment. The strategy can effectively improve the completion rate of the task under the specified time delay requirement, reduce the task unloading cost and improve the utilization rate of system resources.
In order to achieve the above purpose, the multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the internet of vehicles of the invention comprises the following steps:
s1.1, constructing a vehicle edge computing task unloading system model based on clusters:
aggregation of Internet of vehicles components in a certain area into cluster C s S epsilon {1, 2..s }, each cluster is independent of the other, each cluster can be regarded as an independent individual, and is mainly composed of the following four parts: (1) Main control base station integrating SDN
Figure BDA0004150744980000021
I.e. the control center; (2) M base stations providing edge computing services +.>
Figure BDA0004150744980000031
Each base station is bound with an edge server; (3) N (N) s Each is at C s Task vehicle inside->
Figure BDA0004150744980000032
(4) Cloud server C server The control center performs task unloading and resource scheduling control through a task unloading instruction, wherein the task unloading instruction is defined as:
Figure BDA0004150744980000033
the elements in the set respectively represent task unloading nodes, task unloading proportion, resource lease proportion and signal transmitting power;
the control center determines a task unloading instruction according to decision factors, wherein the decision factors comprise task time delay, vehicle energy consumption, resource lease cost and server load penalty value (load), and weight coefficients of all parts are set as follows in sequence: (lambda) 1234 ) The conditions are satisfied:
Figure BDA0004150744980000034
s1.2, building a basic model of each component:
s1.2.1, establishing a basic model of the task vehicle:
v i ={i,p i ,f i ,k i }
wherein i is the serial number of the task vehicle; p is p i Is the relative position of the vehicle; f (f) i For calculating frequency, k of vehicle i Calculating power for the vehicle-mounted OBU;
s1.2.2 a task model of the task vehicle:
Figure BDA0004150744980000035
where J is the task level, J e {1, 2..J }, total J levels, in j Inputting data quantity for a task; cal (cal) j Calculating the amount for the task;
Figure BDA0004150744980000036
for the maximum expected time delay of the task, the higher the task level is, the higher the time delay requirement is, and the higher the corresponding priority is when the task is unloaded;
s1.2.3 builds a basic model of the serving base station:
b m ={m,f m ,p m ,l m }
wherein M is the base station number, M e (1, 2.. Multidot.m); f (f) m Calculating the frequency for the server, p m To calculate the unit price, the value of the unit price is positively correlated with the real-time calculation frequency of the task, l m The load factor is the base station;
s1.2.4 builds a basic model of the cloud server:
c c ={f c ,p c ,t c }
wherein f c To calculate the frequency, p c To calculate the unit price, the data transmission delay t c Satisfy Gaussian distribution
Figure BDA0004150744980000037
S1.3, establishing a task unloading model:
based on 5G cellular communication, a base station increases data transmission capability by using a Massive antenna array (Massive MIMO) technology, and models Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication by combining a Non-orthogonal frequency division multiple access (NOMA) technology. V2V communication is carried out between vehicles through a PC5 interface; V2I communication is carried out between the vehicle and the base station through a Uu interface; the base station, the control center and the cloud server are communicated with each other mainly through optical fibers; the base stations perform bidirectional interaction through a control center; task offloading two modes, one is total offloading; the second is to unload part, unload part according to task separability according to certain division rule unload part calculate task to the goal server to carry out; the target servers are a plurality of edge servers with higher performance and high-performance cloud servers with higher transmission delay;
the data transmission of the Internet of vehicles is carried out through the base station, and the base station is allocated with certain bandwidth resources which can be allocated to different vehicles for data transmission; to describe the fading characteristics of a multipath channel, the communication channel is modeled as a rayleigh channel; the allocation of the bandwidth resources by the base station needs to consider a plurality of factors, such as the communication requirement of the vehicle, the network congestion degree, the load of the base station and the like; the data transmission rate is defined as:
Figure BDA0004150744980000041
wherein the method comprises the steps of
Figure BDA0004150744980000042
Indicating base station b m Assigned to vehicle v i Bandwidth ρ of i Signal transmission power, delta, representing vehicle i Signal gain, sigma, representing the environment in which the vehicle is located m Representing standard deviation of white gaussian noise in the environment;
task offloading can be divided into local computing, edge computing and cloud computing according to different offloading modes and computing nodes;
s1.3.1 a local calculation model is built:
the time delay of local calculation is mainly task decision time delay and calculation time delay, and the task decision time delay t dec Satisfy Gaussian distribution
Figure BDA0004150744980000043
Figure BDA0004150744980000044
Calculating time delay:
Figure BDA0004150744980000045
and (3) locally calculating energy consumption:
Figure BDA0004150744980000046
defining a local calculation cost:
Figure BDA0004150744980000051
s1.3.2 an edge calculation model is built:
the time delay of the edge calculation is mainly task decision time delay, task uploading time delay, task transfer time delay and task execution time delay, and the feedback time delay of the calculation result is ignored;
Figure BDA0004150744980000052
edge calculation energy consumption:
Figure BDA0004150744980000053
edge computing rental fee:
Figure BDA0004150744980000054
edge calculation load penalty:
Figure BDA0004150744980000055
defining edge computation cost:
Figure BDA0004150744980000056
wherein L represents the length of the lane; t is t loss Representing the transmission loss rate;
Figure BDA0004150744980000057
representing task t j Is the actual calculation rate of (a), edge server b m Calculation rate and load rate l of (2) m Related to; a, b, c are coefficients of a binary linear function with respect to the load penalty value;
s1.3.3 an edge calculation model is built:
the time delay of cloud computing is mainly task decision time delay, task uploading time delay, task transmission time delay and task computing time delay, and the result feedback time delay is ignored;
Figure BDA0004150744980000058
task calculation time delay:
Figure BDA0004150744980000061
cloud computing rental fees:
Figure BDA0004150744980000062
defining cloud computing costs:
Figure BDA0004150744980000063
s1.4, establishing a multi-objective joint optimization model:
the system cost, namely the optimization target, is defined as the weighted sum of task delay, vehicle energy consumption, resource lease cost and server load penalty value;
Figure BDA0004150744980000064
wherein lambda is i Satisfy 0 < lambda i <1,
Figure BDA0004150744980000065
The symbol (x) represents the normalized prize value;
the task offloading policy is defined as: solving a set of service strategies that minimizes average system cost, i.e., maximizes rewards or system benefits, over a long period of time, thus the problem can be modeled as:
Figure BDA0004150744980000066
Figure BDA0004150744980000067
Figure BDA0004150744980000068
Figure BDA0004150744980000069
Figure BDA00041507449800000610
Figure BDA00041507449800000611
Figure BDA00041507449800000612
where n (τ) represents the number of mission vehicles at τ slots; d (D) i Representing vehicle v i Task processing energy consumption of (a); c (C) i Representing vehicle v i Task processing energy consumption of (a); f (F) i Representing vehicle v i Resource lease costs of (2); l (L) i Is shown in the process of handling vehicle v i Server load balancing coefficients during the task; (a) The value range of each bonus weight coefficient is 0,1]And the sum is 1; (b) The value range of each sub-action is [0,1 ] for any action]The method comprises the steps of carrying out a first treatment on the surface of the (c) Indicating that the ratio of task offloading and the ratio of resources allocated by the server m for any vehicle at any time slot is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the (d) Indicating that the sum of the proportion of resources allocated by the server m in any time slot is less than or equal to 1; (e) When the task is edge calculation, the task unloading proportion and the resource lease proportion are not equal to 0; (f) When the task is cloud computing, the task unloading proportion is 1;
s1.5 task unloading decision agent training and evaluation:
the invention adopts a depth reinforcement learning algorithm based on an improved dual-delay depth deterministic strategy gradient algorithm (Twin Delayed Deep Deterministic Policy Gradient, TD 3) to carry out task unloading and resource scheduling decision; TD3 is a deep reinforcement learning algorithm for continuous control tasks, which has six networks in total, including two value networks q (s, a; w) i ) I=1, 2 and one policy network u (s; θ) and corresponds to one target network respectively; randomly initializing value network parameters w i And policy network parameters theta, and respectively assigning to the target network
Figure BDA0004150744980000071
And theta -
S1.5.1 environmental input state preprocessing:
the environmental state is represented by vectors
Figure BDA0004150744980000072
Indicating (I)>
Figure BDA0004150744980000073
The system is a multidimensional vector, and consists of calculation frequencies and load coefficients of a plurality of edge servers, vehicle positions, calculation frequencies and calculation power of a vehicle, the input data size of tasks, required calculation amount and maximum time delay:
Figure BDA0004150744980000074
since the sizes of the elements in the vectors are different and the orders of magnitude are also different, the state normalization is needed, and the specific method is implemented for each vector element:
Figure BDA0004150744980000075
wherein I represents an environmental state vector
Figure BDA0004150744980000076
Dimension of->
Figure BDA0004150744980000077
And->
Figure BDA0004150744980000078
Respectively representing an upper limit value and a lower limit value of the vector element;
s1.5.2 a hybrid motion space:
the neural network outputs a vector
Figure BDA0004150744980000079
Indicating (I)>
Figure BDA00041507449800000710
The system is a 4-dimensional vector, represents task offloading decision, and respectively represents offloading node, offloading proportion, resource lease proportion, signal transmitting power, neural network output takes tanh as an activation function, and then the neural network output is converted into decision action of an agent:
Figure BDA00041507449800000711
Figure BDA0004150744980000081
wherein the method comprises the steps of
Figure BDA0004150744980000082
For neural network original output actions, < >>
Figure BDA0004150744980000083
Acts when the intelligent agent actually interacts with the environment; for a pair of
Figure BDA0004150744980000084
And (3) further processing: n (N) num Unloading the number of nodes for the available tasks in the environment, wherein round (x, 0) represents rounding x to an integer, so that the action space is converted from the continuous action space to the mixed action space through continuous action discretization, and the application range of TD3 is expanded; p (P) min And P max Respectively representing a lower limit value and an upper limit value of the vehicle communication power;
s1.5.3 prize value normalization:
the rewarding value consists of task time delay, vehicle energy consumption, resource leasing fee and load rewarding of the server, each sub rewarding value is normalized, and then the final rewarding value is obtained by calculating and summing by a scaling factor:
Figure BDA0004150744980000085
Figure BDA0004150744980000086
wherein r (i) max Upper limit value of sub-prize value, w i Scaling factors for the child prize values;
s1.5.4 a dual noise strategy network is introduced to obtain the experience trace of the agent:
the interaction process of the agent and the environment can be divided into two stages of exploration and utilization: in the exploration phase, the agent explores unknown states and actions in the environment by taking unknown, random strategies; in order to increase the exploration rate, the invention introduces a double-noise interference strategy, namely the output noise of a strategy network; secondly, environmental noise when the intelligent body interacts with the environment;
Figure BDA0004150744980000087
Figure BDA0004150744980000088
wherein u is a
Figure BDA0004150744980000089
Mean and standard deviation of policy network noise are respectively represented, u env
Figure BDA00041507449800000810
Respectively representing the mean value and standard deviation of the environmental noise;
combining strategy noise and environment interaction by the intelligent agent, accumulating track data meeting a specified threshold, recording the state, action, rewards and next state of each track by the intelligent agent, and integrating all track data into an experience pool for training;
s1.5.5 decision agent training:
splitting the experience pool obtained by S1.5.4 into small batches for training a TD3 model and training a strategy network: updating the parameter theta to maximize the value network evaluation value Q, and calculating the gradient through a chain rule:
Figure BDA0004150744980000091
updating the parameter θ by gradient ascent:
θ new ←θ now +β·g
particularly, alpha and beta are learning rates, gamma is discount rate, and the learning rates and the gamma are super parameters which need to be manually adjusted; the new mark represents the updated parameters of the network;
updating value network parameters; the TD error is the difference between the predicted value and the TD target:
Figure BDA0004150744980000092
defining a loss function as the mean square error of the predicted value and the smaller TD target in the two target value networks:
Figure BDA0004150744980000093
updating the value network parameters by adopting gradient descent;
Figure BDA0004150744980000094
policy delay update: after the predicted network is updated for h rounds, the parameters of the target network are updated,
Figure BDA0004150744980000095
phi is the weight ratio of the new parameter to the old parameter:
Figure BDA0004150744980000096
Figure BDA0004150744980000097
testing the network training effect once every time the parameters are updated, and then repeatedly executing S1.5.4 and S1.5.5 until a preset exploration step number threshold value is reached;
after the network training is finished, designing performance evaluation indexes to verify the effectiveness of the algorithm;
s1.5.6 decision agent evaluation:
and (3) evaluating the agent obtained by S1.5.5 through a strategy performance index, wherein the performance index has the following calculation formula:
(1) Average timeout rate:
Figure BDA0004150744980000098
(2) Average energy consumption:
Figure BDA0004150744980000101
(3) Average cost:
Figure BDA0004150744980000102
(4) Average load balancing coefficient:
Figure BDA0004150744980000103
wherein t is out (τ) represents the number of tasks for which slot τ times out; n (τ) represents the number of slot τ mission vehicles; t (T) end Representing the total number of task time slots; a (i) ∈ {0,1}, representing local computation when a (i) =0, and unloading computation when a (i) =1;
Figure BDA0004150744980000104
representing the load average value of the server when a certain task is unloaded;
in order to visually display the importance duty ratio of each performance index in a task unloading process and the overall system performance, the invention designs a system benefit function related to each time slot
Figure BDA0004150744980000105
The calculation formula is as follows:
Figure BDA0004150744980000106
wherein D is i Representing vehicle v i Is a task processing delay; c (C) i Representing vehicle v i Task processing energy consumption of (a); f (F) i Representing vehicle v i Task processing costs of (a); l (L) i Is shown in the process of handling vehicle v i The server load balancing coefficient when the task is performed, the system benefit function is a negative number, and the larger the value is, the better the system performance is.
Aiming at the problems of task unloading and resource scheduling of the Internet of vehicles, the invention provides a multi-target joint optimization task unloading strategy based on deep reinforcement learning. The invention designs a multi-objective joint optimization strategy with respect to task time delay, vehicle energy consumption, resource lease fees and server load balancing coefficients aiming at low time delay, low energy consumption, low cost and high resource utilization rate. The control center firstly collects environment information (such as available computing resources and load information of MEC) and vehicle task request information (such as vehicle and task information), then makes task unloading and resource scheduling decisions by the SDN controller, and along with the continuous update of task vehicle feedback data, the control center continuously improves a decision network and rapidly adapts to the continuously-changing edge computing environment.
The invention builds a system decision model based on an improved TD3 deep reinforcement learning algorithm, and builds a multi-task and multi-target local unloading environment based on clusters. In order to improve the decision accuracy, a double-noise strategy network is proposed to improve the exploration rate of the intelligent agent; it is proposed to use a hybrid motion space to improve the adaptability of the algorithm. In addition, the method is not only suitable for the Internet of vehicles, but also suitable for the Internet of things equipment with the edge computing requirement, and has wide application prospect. The method can obviously improve the completion rate of the task under the specified time delay requirement, effectively reduce the task unloading cost and improve the utilization rate of system resources, thereby improving the user satisfaction and the system benefit.
Drawings
FIG. 1 is a simplified flow diagram of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles of the present invention;
FIG. 2 is a schematic diagram of a system model of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles;
FIG. 3 is a schematic diagram of an unloading flow of a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles;
FIG. 4 is an unloading flow chart of a multi-objective joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles;
FIG. 5 is a comparison diagram of different scheme training of a multi-objective joint optimization task offloading strategy based on deep reinforcement learning in the Internet of vehicles;
FIG. 6 is a time-out rate versus histogram of a multi-objective joint optimization task offloading strategy and comparison scheme based on deep reinforcement learning in the Internet of vehicles of the present invention;
FIG. 7 is a graph of average system benefit versus histogram of a multi-objective joint optimization task offloading strategy and comparison scheme based on deep reinforcement learning in the Internet of vehicles of the present invention;
Detailed Description
In order to better illustrate the technical effects of the invention, the invention is simulated and verified by adopting a specific example. While specific embodiments of the invention are described in conjunction with the drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Examples:
taking a urban road with a length of 600 meters as an example, 3 equally-spaced 5G cellular base stations are distributed on the roadside, each base station is provided with a server with different computing performances, the coverage radius of each base station is 100 meters, the spacing between the base stations is 200 meters, and the specific implementation steps of the multi-optimization task unloading strategy in the vehicle edge computing environment comprise:
s101 decision agent training
The task unloading decision-making agent is trained based on the method. The decision network is designed as a 5-layer fully connected network, comprising 3 hidden layers, 1 input layer and 1 output layer. The number of neurons in each layer is specifically (12X 256X 4),
data for network training originates from agent interactions with the environment, each data containing 4 values
Figure BDA0004150744980000121
Wherein->
Figure BDA0004150744980000122
The current state of the environment is represented, and the current state consists of the calculation frequency and the load coefficient of 3 edge servers, the vehicle position, the calculation frequency and the calculation power of the vehicle, the input data size of tasks, the required calculation amount and the maximum time delay;
Figure BDA0004150744980000123
The method comprises the steps of representing actions executed by an intelligent agent, and forming four parts of a task unloading node, a task unloading proportion, a resource leasing proportion and signal transmitting power; r represents rewards, and the agent is in status +.>
Figure BDA0004150744980000124
Execution of action down->
Figure BDA0004150744980000125
After this, r can be derived from the environment and enter the next state +.>
Figure BDA0004150744980000126
The intelligent agent judges whether the decision is good or bad through r, and awardsThe incentive is defined as the opposite of the offload cost, the smaller the cost, the greater the incentive:
Figure BDA0004150744980000127
the unloading cost consists of task time, vehicle energy consumption, resource lease cost and server load penalty value;
and dividing a continuous period of time into a plurality of discrete time nodes to obtain a system benefit function, and determining an optimal task unloading strategy according to the value and the distribution interval of the benefit function.
Figure BDA0004150744980000128
The present example trains decision-making agents based on an improved TD3 algorithm, which has six neural networks in total, comprising two value networks q (s, a; w) i ) I e {1,2} and a policy network u (s; θ) and randomly initializing value network parameters w corresponding to one target network respectively i And policy network parameter θ, and assign to the target network. Combining strategy noise and environment interaction by the agent, accumulating track data meeting a specified threshold value, putting the track data into an experience pool, and then extracting data to start strategy network training: and updating the parameter theta so that the value network evaluation value Q is maximum. Gradient was calculated by the chain law:
Figure BDA0004150744980000129
updating the parameter θ by gradient ascent:
θ new ←θ now +β·g
updating value network parameters; the TD error is the difference between the predicted value and the TD target:
Figure BDA0004150744980000131
defining a loss function as the mean square error of the predicted value and the smaller TD target in the two target value networks:
Figure BDA0004150744980000132
updating the value network parameters by adopting gradient descent;
Figure BDA0004150744980000133
policy delay update: after the predicted network is updated for h rounds, the parameters of the target network are updated,
Figure BDA0004150744980000134
phi is the weight ratio of the new parameter to the old parameter:
Figure BDA0004150744980000135
Figure BDA0004150744980000136
and testing the network training effect once every time the parameters are updated. The agent continuously explores the environment, thereby continuously updating the experience pool, and continuously updating the network parameters until reaching the preset exploration threshold. After training, the control center deploys decision-making agent.
S102: task offloading request issue:
task vehicle v i The vehicle information and the task basic information I { v, t } are transmitted to the control center through the nearest base station. The vehicle information is expressed as: v i ={i,p i ,f i ,k i The task basic information is expressed as:
Figure BDA0004150744980000137
s103: the control center collects information and makes decisions:
when the control center receives the unloading request, firstly, the key information in the request is extracted, and the states of the edge servers in the current time slot are integrated to form a state information vector together
Figure BDA0004150744980000138
Then inputting the motion vector into decision network to obtain a motion vector
Figure BDA0004150744980000139
The action vector is arranged to obtain an unloading scheme and a resource scheduling scheme of the task, and the unloading scheme and the resource scheduling scheme are respectively sent to the task request node and the service node.
S104: uploading data to complete the calculation task:
the task request node sends the complete information of the task to a designated service node according to an unloading scheme; and the service node executes the calculation task according to the resource scheduling scheme after receiving the task information, and returns the calculation result to the task vehicle.
S105: vehicle feedback information:
after the task is completed, the task vehicle feeds back the unloading result to the control center, and the control center integrates the feedback with the previous task request information and stores the integrated feedback and the previous task request information into a local database so as to update the decision network for use.
To verify the effectiveness and practicality of the present invention, we devised several comparative schemes. The specific scheme is as follows:
scheme 1: task random computation (Randomized Computing). The random calculation of the task means that the values of the task unloading node, the unloading proportion, the calculated lease proportion and the vehicle transmitting power are determined in a random mode in the task unloading process.
Scheme 2: the task is calculated locally (Local). The task computing participants have only the task vehicle itself.
Scheme 3: the tasks are polygon computations and task Decisions (DQN) are made based on the DQN. DQN (Deep QNetwork) is a classical deep reinforcement learning algorithm which can be used to solve reinforcement learning problems with discrete motion spaces and is widely used in various reinforcement learning scenarios including electronic games, robot control, etc.
Scheme 4: tasks are polygonal computations and task decisions (SACs) are made based on SACs. SAC (Soft Actor-Critic) is a deep reinforcement learning algorithm based on maximum entropy theory, and aims to solve the reinforcement learning problem in continuous motion space.
Scheme 5: the task is edge calculation and task decision making (mec_td3) based on the algorithms herein. The scheme is a variation of the scheme, and the task unloading decision is that the vehicle self resources are not utilized
Figure BDA0004150744980000141
The task offloading ratio is set to 1.
The method of the present invention is scheme 6. Table 1 is the relevant experimental parameters in the experiment. Different service scenes of the vehicle, such as automatic driving, online entertainment and the like, are simulated by designing different delay indexes, different calculated amounts and different input data of a task; by designing different positions and different performances of the vehicle, different performances of the edge server simulate the diversity of environments with different load rates.
Table 1 related experimental parameters
Figure BDA0004150744980000142
Figure BDA0004150744980000151
In order to better illustrate the technical effects of the present invention, the present example adopts the following 4 performance indexes, and the specific calculation expressions of the indexes are as follows:
(1) Average timeout rate:
Figure BDA0004150744980000152
(2) Average energy consumption:
Figure BDA0004150744980000153
(3) Average cost:
Figure BDA0004150744980000154
(4) Average load balancing coefficient:
Figure BDA0004150744980000155
wherein t is out (τ) represents the number of tasks for which slot τ times out; n (τ) represents the number of slot τ tasks;
a (i) ∈ {0,1}, representing local computation when a (i) =0, and unloading computation when a (i) =1;
Figure BDA0004150744980000156
representing the average value of the server load when a certain task is unloaded.
In order to visually display the importance duty ratio of each performance index in a task unloading process and the overall system performance, the invention designs a system benefit function related to each time slot
Figure BDA0004150744980000157
The calculation formula is as follows:
Figure BDA0004150744980000158
wherein T is i Representing vehicle v i Is a task processing delay; e (E) i Representing vehicle v i Task processing energy consumption of (a); f (F) i Representing vehicle v i Task processing costs of (a); l (L) i Is shown in the process of handling vehicle v i The server load balancing coefficient when the task is performed, the system benefit function is a negative number, and the larger the value is, the better the system performance is.
30 runs of the experiment were performed for each protocol, and then the average of the indices for each run was calculated. Table 2 shows the experimental results of the method of the present invention and other protocols.
Table 2 experimental results
Figure BDA0004150744980000161
From the table we can get the following information: the method of the invention has optimal performance on a plurality of indexes such as timeout rate, energy consumption, load balance and the like, and simultaneously greatly improves the average system benefit. Further analysis of table 2 we can find that: the scheme 1 adopts a random unloading mode, can adopt local calculation or cloud calculation, reduces the use of an edge server to a certain extent, and therefore has better performance in terms of rental cost and load balancing; the scheme 2 adopts a local calculation mode, so that the energy consumption is highest, and meanwhile, the time-out rate is highest and the system benefit is lowest due to the fact that the calculation resources of the vehicle are insufficient; the scheme 3 reduces the timeout rate, but the DQN algorithm cannot process the decision problem in the continuous action space, and task unloading decisions can be obtained only by discretizing the continuous action space, however, the scheme 3 is poor in two indexes of lease cost and load balancing due to the fact that the dimension disaster problem exists and the decision level is limited, and is still superior to the scheme 1 and the scheme 2 in system benefit; the SAC algorithm in scheme 4 can handle decision problems in continuous motion space, has optimal rental cost, and is superior to scheme 3 in performance, but weaker than schemes 5 and 6. Although the overall performance is better, the performance is weaker than the method of the present invention because the resources of the vehicle itself are not utilized, while considering the limited edge computing resources.
The technical effects of the present invention are next analyzed from the following two aspects:
(1) Validity analysis
Firstly, the invention can realize multiparty cooperation and cooperative processing between the vehicle and the service node, and improve the overall efficiency and performance of the Internet of vehicles. The task unloading transfers the data processing service on the vehicle to the edge node or the cloud node for processing, and the invention realizes the optimal utilization of resources through efficient task unloading and resource allocation planning, thereby improving the overall efficiency and performance of the Internet of vehicles.
Secondly, the task unloading method effectively reduces the task unloading cost and improves the resource utilization rate through the multi-target joint optimization task unloading strategy. According to the invention, the calculation tasks are completed by fully utilizing the resources of the vehicle and the service node, so that the problem of insufficient calculation resources of the vehicle and the edge is effectively solved, the endurance time of the vehicle is prolonged, the resource lease cost is reduced, the load balancing rate of the edge server is improved, and most importantly, the risk of the vehicle is reduced and the safety of a driver is improved by reducing the task overtime rate when the vehicle processes services such as automatic driving and the like; and the different computing capacities and load levels of different servers are considered, so that the load balancing rate of the edge servers is improved, and the resource utilization rate is higher.
In conclusion, the method has higher effectiveness. Through multiparty cooperation and cooperative processing of the vehicle and the service node and the multi-target joint optimization task unloading strategy, the method and the system solve the problem of insufficient computing resources of the vehicle and the edge to a certain extent, prolong the endurance time of the vehicle, reduce the resource lease cost and improve the resource utilization rate of the edge server. Meanwhile, a large number of experimental results show that the invention has obvious effect compared with other schemes in the aspect of improving the system benefit. In addition, fig. 5, 6 and 7 demonstrate the superior performance of the present invention.
(2) Adaptive analysis
First, the present invention considers the heterogeneity between different vehicles and different servers from the system level, i.e. the hardware and software configurations of different vehicles may be different, and the computing performance and operating states of different edge servers may be different. Therefore, in the task unloading process, the task can be unloaded according to the characteristics of different vehicles and different servers, and the task can be successfully completed.
Secondly, from the application level, the invention considers different calculation requirements of the vehicle, such as different calculation requirements of the business of automatic driving, online entertainment and the like, and selects the most suitable task unloading scheme for the different calculation requirements so as to achieve the best performance and effect.
In summary, the invention can adapt to different task vehicles, different edge servers and different calculation demands, and has higher adaptability.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised in accordance with the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (1)

1. A multi-target joint optimization task unloading strategy based on deep reinforcement learning in the Internet of vehicles is characterized by comprising the following steps:
s1.1, constructing a vehicle edge computing task unloading system model based on clusters:
aggregation of Internet of vehicles components in a certain area into cluster C s S epsilon {1, 2..s }, each cluster is independent of the other, each cluster can be regarded as an independent individual, and is mainly composed of the following four parts: (1) Main control base station integrating SDN
Figure FDA0004150744970000011
I.e. the control center; (2) M base stations providing edge computing services +.>
Figure FDA0004150744970000012
Each base station is bound with an edge server; (3) N (N) s Each is at C s Task vehicle inside->
Figure FDA0004150744970000013
(4) Cloud server C server The control center performs task unloading and resource scheduling control through a task unloading instruction, wherein the task unloading instruction is defined as:
Figure FDA0004150744970000014
the elements in the set respectively represent task unloading nodes, task unloading proportion, resource lease proportion and signal transmitting power;
the control center determines a task unloading instruction according to decision factors, wherein the decision factors comprise task time delay, vehicle energy consumption, resource lease cost and server load penalty value (load), and weight coefficients of all parts are set as follows in sequence: (lambda) 1234 ) The conditions are satisfied:
Figure FDA0004150744970000015
s1.2, building a basic model of each component:
s1.2.1, establishing a basic model of the task vehicle:
v i ={i,p i ,f i ,k i }
wherein i is the serial number of the task vehicle; p is p i Is the relative position of the vehicle; f (f) i For calculating frequency, k of vehicle i Calculating power for the vehicle-mounted OBU;
s1.2.2 a task model of the task vehicle:
Figure FDA0004150744970000016
where J is the task level, J e {1, 2..J }, total J levels, in j Inputting data quantity for a task; cal (cal) j Calculating the amount for the task;
Figure FDA0004150744970000017
for the maximum expected time delay of the task, the higher the task level is, the higher the time delay requirement is, and the higher the corresponding priority is when the task is unloaded;
s1.2.3 builds a basic model of the serving base station:
b m ={m,f m ,p m ,l m }
wherein M is the base station number, M e (1, 2.. Multidot.m); f (f) m Computing for a serverFrequency, p m To calculate the unit price, the value of the unit price is positively correlated with the real-time calculation frequency of the task, l m The load factor is the base station;
s1.2.4 builds a basic model of the cloud server:
c c ={f c ,p c ,t c }
wherein f c To calculate the frequency, p c To calculate the unit price, the data transmission delay t c Satisfy Gaussian distribution
Figure FDA0004150744970000021
S1.3, establishing a task unloading model:
based on 5G cellular communication, a base station increases data transmission capability by using a Massive antenna array (Massive MIMO) technology, and models Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication by combining a Non-orthogonal frequency division multiple access (NOMA) technology. V2V communication is carried out between vehicles through a PC5 interface; V2I communication is carried out between the vehicle and the base station through a Uu interface; the base station, the control center and the cloud server are communicated with each other mainly through optical fibers; the base stations perform bidirectional interaction through a control center; task offloading two modes, one is total offloading; the second is to unload part, unload part according to task separability according to certain division rule unload part calculate task to the goal server to carry out; the target servers are a plurality of edge servers with higher performance and high-performance cloud servers with higher transmission delay;
the data transmission of the Internet of vehicles is carried out through the base station, and the base station is allocated with certain bandwidth resources which can be allocated to different vehicles for data transmission; to describe the fading characteristics of a multipath channel, the communication channel is modeled as a rayleigh channel; the allocation of the bandwidth resources by the base station needs to consider a plurality of factors, such as the communication requirement of the vehicle, the network congestion degree, the load of the base station and the like; the data transmission rate is defined as:
Figure FDA0004150744970000022
wherein the method comprises the steps of
Figure FDA0004150744970000023
Indicating base station b m Assigned to vehicle v i Bandwidth ρ of i Signal transmission power, delta, representing vehicle i Signal gain, sigma, representing the environment in which the vehicle is located m Representing standard deviation of white gaussian noise in the environment;
task offloading can be divided into local computing, edge computing and cloud computing according to different offloading modes and computing nodes;
s1.3.1 a local calculation model is built:
the time delay of local calculation is mainly task decision time delay and calculation time delay, and the task decision time delay t dec Satisfy Gaussian distribution
Figure FDA0004150744970000024
Figure FDA0004150744970000025
Calculating time delay:
Figure FDA0004150744970000031
and (3) locally calculating energy consumption:
Figure FDA0004150744970000032
defining a local calculation cost:
Figure FDA0004150744970000033
s1.3.2 an edge calculation model is built:
the time delay of the edge calculation is mainly task decision time delay, task uploading time delay, task transfer time delay and task execution time delay, and the feedback time delay of the calculation result is ignored;
Figure FDA0004150744970000034
edge calculation energy consumption:
Figure FDA0004150744970000035
edge computing rental fee:
Figure FDA0004150744970000036
edge calculation load penalty:
Figure FDA0004150744970000037
defining edge computation cost:
Figure FDA0004150744970000038
wherein L represents the length of the lane; t is t loss Representing the transmission loss rate;
Figure FDA0004150744970000039
representing task t j Is the actual calculation rate of (a), edge server b m Calculation rate and load rate l of (2) m Related to; a, b, c are coefficients of a binary linear function with respect to the load penalty value;
s1.3.3 an edge calculation model is built:
the time delay of cloud computing is mainly task decision time delay, task uploading time delay, task transmission time delay and task computing time delay, and the result feedback time delay is ignored;
Figure FDA0004150744970000041
task calculation time delay:
Figure FDA0004150744970000042
cloud computing rental fees:
Figure FDA0004150744970000043
defining cloud computing costs:
Figure FDA0004150744970000044
s1.4, establishing a multi-objective joint optimization model:
the system cost, namely the optimization target, is defined as the weighted sum of task delay, vehicle energy consumption, resource lease cost and server load penalty value;
Figure FDA0004150744970000045
wherein lambda is i Satisfy 0 < lambda i <1,
Figure FDA0004150744970000046
The symbol (x) represents the normalized prize value;
the task offloading policy is defined as: solving a set of service strategies that minimizes average system cost, i.e., maximizes rewards or system benefits, over a long period of time, thus the problem can be modeled as:
Figure FDA0004150744970000047
Figure FDA0004150744970000051
Figure FDA0004150744970000052
Figure FDA0004150744970000053
Figure FDA0004150744970000054
Figure FDA0004150744970000055
Figure FDA0004150744970000056
where n (τ) represents the number of computation tasks at τ slots; d (D) i Representing vehicle v i Task processing energy consumption of (a); c (C) i Representing vehicle v i Task processing energy consumption of (a); f (F) i Representing vehicle v i Resource lease costs of (2); l (L) i Is shown in the process of handling vehicle v i Server load balancing coefficients during the task; (a) The value range of each bonus weight coefficient is 0,1]And the sum is 1; (b) The value range of each sub-action is [0,1 ] for any action]The method comprises the steps of carrying out a first treatment on the surface of the (c) Representing the ratio of task offloading and the resources allocated by the server m for any vehicle at any time slotThe ratio is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the (d) Indicating that the sum of the proportion of resources allocated by the server m in any time slot is less than or equal to 1; (e) When the task is edge calculation, the task unloading proportion and the resource lease proportion are not equal to 0; (f) When the task is cloud computing, the task unloading proportion is 1;
s1.5 task unloading decision agent training and evaluation:
the invention adopts a depth reinforcement learning algorithm based on an improved dual-delay depth deterministic strategy gradient algorithm (Twin Delayed Deep Deterministic Policy Gradient, TD 3) to carry out task unloading and resource scheduling decision; TD3 is a deep reinforcement learning algorithm for continuous control tasks, which has six networks in total, including two value networks q (s, a; w) i ) I=1, 2 and one policy network u (s; θ) and corresponds to one target network respectively; randomly initializing value network parameters w i And policy network parameters theta, and respectively assigning to the target network
Figure FDA0004150744970000057
And theta -
S1.5.1 environmental input state preprocessing:
the environmental state is represented by vectors
Figure FDA0004150744970000058
Indicating (I)>
Figure FDA0004150744970000059
The system is a multidimensional vector, and consists of calculation frequencies and load coefficients of a plurality of edge servers, vehicle positions, calculation frequencies and calculation power of a vehicle, the input data size of tasks, required calculation amount and maximum time delay:
Figure FDA00041507449700000510
since the sizes of the elements in the vectors are different and the orders of magnitude are also different, the state normalization is needed, and the specific method is implemented for each vector element:
Figure FDA0004150744970000061
where I represents the dimension of the environmental state vector s,
Figure FDA0004150744970000062
and->
Figure FDA0004150744970000063
Respectively representing an upper limit value and a lower limit value of the vector element;
s1.5.2 a hybrid motion space:
the neural network outputs a vector
Figure FDA0004150744970000064
Indicating (I)>
Figure FDA0004150744970000065
The system is a 4-dimensional vector, represents task offloading decision, and respectively represents offloading node, offloading proportion, resource lease proportion, signal transmitting power, neural network output takes tanh as an activation function, and then the neural network output is converted into decision action of an agent:
Figure FDA0004150744970000066
Figure FDA0004150744970000067
wherein the method comprises the steps of
Figure FDA0004150744970000068
For neural network original output actions, < >>
Figure FDA0004150744970000069
Acts when the intelligent agent actually interacts with the environment; for->
Figure FDA00041507449700000610
And (3) further processing: n (N) num Unloading the number of nodes for the available tasks in the environment, wherein round (x, 0) represents rounding x to an integer, so that the action space is converted from the continuous action space to the mixed action space through continuous action discretization, and the application range of TD3 is expanded; p (P) min And P max Respectively representing a lower limit value and an upper limit value of the vehicle communication power;
s1.5.3 prize value normalization:
the rewarding value consists of task time delay, vehicle energy consumption, resource leasing fee and load rewarding of the server, each sub rewarding value is normalized, and then the final rewarding value is obtained by calculating and summing by a scaling factor:
Figure FDA00041507449700000611
Figure FDA00041507449700000612
wherein r (i) max Upper limit value of sub-prize value, w i Scaling factors for the child prize values;
s1.5.4 a dual noise strategy network is introduced to obtain the experience trace of the agent:
TD3 belongs to an exclusive policy (Off-policy); in the exploration phase, the agent explores unknown states and actions in the environment by taking unknown, random strategies; in order to increase the exploration rate, the invention introduces a double-noise interference strategy, namely the output noise of a strategy network; secondly, environmental noise when the intelligent body interacts with the environment;
Figure FDA0004150744970000071
Figure FDA0004150744970000072
wherein u is a
Figure FDA0004150744970000073
Mean and standard deviation of policy network noise are respectively represented, u env
Figure FDA0004150744970000074
Respectively representing the mean value and standard deviation of the environmental noise;
combining strategy noise and environment interaction by the intelligent agent, accumulating track data meeting a specified threshold, recording the state, action, rewards and next state of each track by the intelligent agent, and integrating all track data into an experience pool for training;
s1.5.5 decision agent training:
splitting the experience pool obtained by S1.5.4 into small batches for training a TD3 model and training a strategy network: updating the parameter theta to maximize the value network evaluation value Q, and calculating the gradient through a chain rule:
Figure FDA0004150744970000075
updating the parameter θ by gradient ascent:
θ new ←θ now +β·g
particularly, alpha and beta are learning rates, gamma is discount rate, and the learning rates and the gamma are super parameters which need to be manually adjusted; the new mark represents the updated parameters of the network;
updating value network parameters; the TD error is the difference between the predicted value and the TD target:
Figure FDA0004150744970000076
defining a loss function as the mean square error of the predicted value and the smaller TD target in the two target value networks:
Figure FDA0004150744970000077
updating the value network parameters by adopting gradient descent;
Figure FDA0004150744970000078
policy delay update: after the predicted network is updated for h rounds, the parameters of the target network are updated,
Figure FDA0004150744970000079
phi is the weight ratio of the new parameter to the old parameter:
Figure FDA0004150744970000081
Figure FDA0004150744970000082
testing the network training effect once every time the parameters are updated, and then repeatedly executing S1.5.4 and S1.5.5 until a preset exploration step number threshold value is reached;
after the network training is finished, designing performance evaluation indexes to verify the effectiveness of the algorithm;
s1.5.6 decision agent evaluation:
and (3) evaluating the agent obtained by S1.5.5 through a strategy performance index, wherein the performance index has the following calculation formula:
(1) Average timeout rate:
Figure FDA0004150744970000083
(2) Average energy consumption:
Figure FDA0004150744970000084
(3) Average cost:
Figure FDA0004150744970000085
(4) Average load balancing coefficient:
Figure FDA0004150744970000086
wherein t is out (τ) represents the number of tasks for which slot τ times out; n (τ) represents the number of slot τ mission vehicles; t (T) end Representing the total number of task time slots; a (i) ∈ {0,1}, representing local computation when a (i) =0, and unloading computation when a (i) =1;
Figure FDA0004150744970000087
representing the load average value of the server when a certain task is unloaded;
in order to visually display the importance ratio of each performance index in one task unloading process and the overall system performance, the invention designs a system benefit function related to each time slot in a limited time
Figure FDA0004150744970000088
The calculation formula is as follows:
Figure FDA0004150744970000089
wherein D is i Representing vehicle v i Is a task processing delay; c (C) i Representing vehicle v i Task processing energy consumption of (a); f (F) i Representing vehicle v i Task processing costs of (a); l (L) i Is shown in the process of handling vehicle v i The server load balancing coefficient when the task is performed, the system benefit function is a negative number, and the larger the value is, the better the system performance is.
The invention has the advantages that:
firstly, the invention considers the demands of users in task unloading and resource scheduling decisions, such as task time delay, vehicle energy consumption and calculation expenditure, and also considers the load condition of the server, thereby improving the resource utilization rate by improving the load balancing rate of the server, and having remarkable economic benefit.
Secondly, the task unloading problem of the Internet of vehicles is researched by combining the characteristics of the 5G cellular network, and the high-speed, low-time-delay, high-capacity, wide-bandwidth and other characteristics of the 5G network are utilized, so that the time delay of task unloading is greatly reduced, and the satisfaction degree of users is improved.
Finally, an improved TD3 deep reinforcement learning algorithm is adopted for task unloading and resource scheduling decision: (1) The exploration rate of the intelligent agent is improved by adopting a double-noise strategy network; (2) The use of the hybrid motion space expands the application range of TD 3. The time-exchange cost is used for deep reinforcement learning, so that the decision complexity is greatly reduced, and the decision efficiency is improved. The invention has important application prospect and can play an important role in the field of Internet of vehicles.
CN202310318141.7A 2023-03-29 2023-03-29 Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles Pending CN116321298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310318141.7A CN116321298A (en) 2023-03-29 2023-03-29 Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310318141.7A CN116321298A (en) 2023-03-29 2023-03-29 Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles

Publications (1)

Publication Number Publication Date
CN116321298A true CN116321298A (en) 2023-06-23

Family

ID=86797722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310318141.7A Pending CN116321298A (en) 2023-03-29 2023-03-29 Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN116321298A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117042051A (en) * 2023-08-29 2023-11-10 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles
CN117412349A (en) * 2023-12-13 2024-01-16 湖南大学无锡智能控制研究院 Service switching method, device and system based on edge server performance

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117042051A (en) * 2023-08-29 2023-11-10 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles
CN117042051B (en) * 2023-08-29 2024-03-08 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles
CN117412349A (en) * 2023-12-13 2024-01-16 湖南大学无锡智能控制研究院 Service switching method, device and system based on edge server performance

Similar Documents

Publication Publication Date Title
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN116321298A (en) Multi-objective joint optimization task unloading strategy based on deep reinforcement learning in Internet of vehicles
CN113435472A (en) Vehicle-mounted computing power network user demand prediction method, system, device and medium
CN114189892A (en) Cloud-edge collaborative Internet of things system resource allocation method based on block chain and collective reinforcement learning
CN113568675A (en) Internet of vehicles edge calculation task unloading method based on layered reinforcement learning
CN114143346B (en) Joint optimization method and system for task unloading and service caching of Internet of vehicles
CN113132943B (en) Task unloading scheduling and resource allocation method for vehicle-side cooperation in Internet of vehicles
CN113543074A (en) Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
CN115134242B (en) Vehicle-mounted computing task unloading method based on deep reinforcement learning strategy
Lv et al. Edge computing task offloading for environmental perception of autonomous vehicles in 6G networks
Gao et al. Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing
CN115052262A (en) Potential game-based vehicle networking computing unloading and power optimization method
CN117221950A (en) Vehicle task unloading method and system based on deep reinforcement learning
CN116566838A (en) Internet of vehicles task unloading and content caching method with cooperative blockchain and edge calculation
CN117221951A (en) Task unloading method based on deep reinforcement learning in vehicle-mounted edge environment
CN116582836B (en) Task unloading and resource allocation method, device, medium and system
CN117290071A (en) Fine-grained task scheduling method and service architecture in vehicle edge calculation
CN115865914A (en) Task unloading method based on federal deep reinforcement learning in vehicle edge calculation
CN114928826A (en) Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation
CN115208892A (en) Vehicle-road cooperative online task scheduling method and system based on dynamic resource demand
Chouikhi et al. Energy-Efficient Computation Offloading Based on Multi-Agent Deep Reinforcement Learning for Industrial Internet of Things Systems
CN114584951A (en) Combined computing unloading and resource allocation method based on multi-agent DDQN
CN118113484B (en) Resource scheduling method, system, storage medium and vehicle
Sun et al. Deep Reinforcement Learning for Energy Minimization in Multi-RIS-Aided Cell-Free MEC Networks
CN118612754B (en) Three-in-one terminal control system and method capable of intelligent networking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination