CN117528649A - Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture - Google Patents

Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture Download PDF

Info

Publication number
CN117528649A
CN117528649A CN202310780228.6A CN202310780228A CN117528649A CN 117528649 A CN117528649 A CN 117528649A CN 202310780228 A CN202310780228 A CN 202310780228A CN 117528649 A CN117528649 A CN 117528649A
Authority
CN
China
Prior art keywords
task
server
edge
energy consumption
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310780228.6A
Other languages
Chinese (zh)
Inventor
曾令秋
胡晗
韩庆文
雷瑜
叶蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202310780228.6A priority Critical patent/CN117528649A/en
Publication of CN117528649A publication Critical patent/CN117528649A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention belongs to the technical field of intelligent transportation, and particularly discloses a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimization method and the end-edge cloud system architecture, wherein the task unloading and resource allocation optimization method controls all vehicles to send task information randomly generated by the vehicles in each time interval to a decision maker server, calculates task migration cost, establishes a task unloading model and a task migration model, and establishes an objective function; constructing a state space and an action space of a deep reinforcement learning algorithm; and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization. By adopting the technical scheme, a task unloading and task migration model is provided, a task migration problem is solved by adopting a particle swarm algorithm based on a set, and a decision problem of task unloading and resource allocation is solved by adopting a deep reinforcement learning algorithm, so that the requirement of minimizing the long-term overhead of the system is met, and the convergence speed is increased.

Description

Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture
Technical Field
The invention belongs to the technical field of intelligent transportation, and relates to a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimizing method and the end-edge cloud system architecture.
Background
With the rapid development of intelligent traffic, novel vehicle-mounted applications and services such as intelligent auxiliary driving, real-time road conditions, image navigation and vehicle-mounted entertainment are introduced in a large quantity. Because of the fixed and limited computing and storage capabilities of local on-board devices, it is often difficult to meet the high latency and computing resource requirements of such tasks solely by local vehicle processing in the face of the explosive growth of intelligent on-board services. The introduction of multiple access edge computing (MEC) for vehicles under the end Bian Yun co-architecture provides powerful service capabilities for road vehicles.
The task unloading and resource allocation are always research hotspots and difficulties in the field of MEC and end-edge cloud cooperation, and the demands of the task on time delay and energy consumption can be better met by unloading the task to an edge server with rich resources and a cloud server for execution. Reasonable task offloading policies can improve network performance and quality of service (Quality of Service, qoS) of the end-edge cloud architecture.
The dynamics of the data requirements of the vehicle-mounted task, the dynamics of the calculation requirements and the dynamics of the available resources of the edge server provide challenges for the task offloading strategy. In the task unloading process, the limitation of bandwidth capacity and computational power resources is required to be fully considered so as to ensure that the calculation time delay and the communication transmission time delay can meet the response requirements of vehicle-mounted applications.
Therefore, the efficient execution of the task under the end-edge cloud architecture needs to fully analyze the vehicle-mounted application requirements on one hand, and on the other hand, needs to fully match the task unloading position, unloading proportion and resource allocation with the application requirements, namely, needs to study the task unloading position, unloading amount and communication resource allocation amount. In addition, when the computing resources and the communication resources of the current MEC server cannot meet the current task, the task is migrated to other MEC server computing processing, so that the task success rate can be improved, and the system overhead can be reduced.
In the prior art, the end-edge cloud system calculation unloading and resource allocation optimization scheme based on DQN (Deep Q-network, namely Q-Learning algorithm based on Deep Learning) has different bias of tasks on time delay and energy consumption, and DRL (Deep reinforcement Learning) methods such as DQN, Q-Learning, DDQN (Double DQN algorithm) and the like have the problems of over-estimation of the unavoidable Q value, slow convergence speed and the like, perform well in a large-state action space, and can bring higher estimation variance due to the use of a greedy strategy, and bootstrap can cause accumulated superposition of overestimate errors in the algorithm training process, so that the Q value estimation is further overlarge; and the methods do not adopt an experience playback mechanism based on priority, so that the algorithm learning efficiency is low and the convergence speed is low.
Disclosure of Invention
The invention aims to provide a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimization method and the end-edge cloud system architecture, so as to obtain optimal decisions of task unloading and resource allocation and improve convergence speed.
In order to achieve the above purpose, the basic scheme of the invention is as follows: a method of establishing a terminal edge cloud system architecture, comprising the steps of:
the method comprises the steps of deploying a road side unit and base stations along a single line straight channel, spacing adjacent base stations by L meters, enabling the signal coverage radius of each base station to be L/2, and providing an edge server;
the vehicle and the road side unit realize data interaction and establish connection with the base station;
the system comprises K local vehicles, M edge servers and a cloud server, wherein each vehicle randomly generates a task in each time intervalThe method comprises the following steps:
wherein b i For the amount of task data c i The amount of computation required for the task unit data,tolerance time delay for the task; i denotes an i-th vehicle, t denotes a t-th time slot;
the time slot t local Task set Task (t) is:
in each time interval, all vehicles send their own states to a decision maker server;
the decision maker server collects the conditions and state information of available computing resources and communication resources of all edge servers in the end-edge cloud system at the current moment, and then selects action decisions based on the current state;
After the decision maker server makes action decisions, the result is returned to the local vehicle, the local vehicle executes the tasks by referring to the result sent by the decision maker server, the tasks are unloaded to the corresponding computing units, the system overhead is counted, and meanwhile rewards are computed and returned to the decision maker server.
The working principle and the beneficial effects of the basic scheme are as follows: according to the technical scheme, an end-edge cloud system architecture is established, multiple mobile devices and a multi-edge server are introduced, global observation is carried out by a decision maker server, action decision is carried out by combining the residual communication resources and the computing resources of the edge server, and the use is facilitated.
The invention also provides a task unloading and resource allocation optimizing method, which comprises the following steps:
according to the establishment method, an end-side cloud system architecture is established, and in each time interval, all vehicles send task information randomly generated by the vehicles at the current moment and the state of the vehicles to a decision maker server, wherein the types of the tasks are common, computationally intensive, data intensive or time delay sensitive;
based on the characteristics of different calculation tasks, taking the bias factors of time delay and energy consumption into consideration, introducing a load balancing factor, calculating the task migration cost, and constructing a task unloading model and a task migration model;
Establishing an objective function by using a task unloading model and a task migration model;
constructing a state space and an action space;
and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization.
According to the technical scheme, the calculation models of time delays and energy consumption of different layers of tasks under the end-to-end cloud architecture are defined, meanwhile, the situation that the edge server which is accessed by a vehicle at present cannot serve is considered, the unloaded target edge server is selected, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the edge servers which provide services is optimized, the requirements of low time delay for completing computation-intensive tasks and relieving excessive load of computing resources are met, and the convergence speed is high.
Further, the method for constructing the task unloading model comprises the following steps:
the time delay and the processing energy consumption for unloading a task to different devices for execution comprise the time delay and the energy consumption of local calculation of the task; unloading to an MEC server through a wireless network, wherein the calculation time delay and the energy consumption of the MEC on the server are calculated; and time delay and energy consumption of unloading to the cloud server;
task local calculation time T i local And energy consumptionThe method comprises the following steps of: />
Wherein,respectively represent task offloading to local and edge clothing Ratio of server to cloud server, f i local Representing the magnitude of the computing power of the vehicle; k (k) local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b i For the amount of task data c i The amount of computation required for the task unit data;
the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T i up,eage Processing time delay T i exe,eage And the result is returned and delayed, and the energy consumption of the task unloading to the MEC edge server is uploaded by the dataTreatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server i edge And energy consumption->Expressed as:
wherein B is i,j ,Cpt i,j For allocation to tasksThe bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; k (k) edge Is the inherent energy consumption coefficient of CPU of the edge server, f j eage Computing power for the corresponding edge server; r is (r) i,j For task iMaximum uplink data rate between the local vehicle and the edge server; p is p i Signal transmitting power for the local vehicle;
when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of transfer of the MEC edge server, then the MEC edge server is uploaded to the cloud server, the transmission delay from the edge server to the cloud server is regarded as a fixed value FD, and the delay T of the task unloading to the cloud server i cloud And energy consumptionExpressed as:
wherein T is i up,cloud And (5) time delay for uploading the data to the cloud server.
And constructing a task unloading model, and improving the network efficiency and the service quality of the end-edge cloud architecture.
Further, the specific method for constructing the task migration model is as follows:
performing overall task migration, wherein in the migration process, the migration time delay is prolongedThe method comprises the following steps:
wherein,representing the upload delay to the nearest edge server,/-for the most recent edge server>Representative of discharging tasksTransmission delay carried to target edge server according to particle swarm algorithm>Representing the time delay spent executing.
The operation is simple, and the use is facilitated.
Further, a collection-based particle swarm algorithm is adopted to select a server when task unloading is carried out:
setting the particle position code to represent the edge server selection scheme of all the current tasks, which is recorded asIf there are m edge servers in total, x is i J (1. Ltoreq.j.ltoreq.m) represents that the ith task selects j edge servers as unloading objects of the edge layer, and the speed of the particles represents the trend of the current task selecting other edge servers for task unloading;
the method comprises the following specific steps:
s1, initializing a particle swarm, wherein the particle swarm comprises a particle swarm scale, a speed and a position of each particle, and simultaneously initializing an individual optimal position and a global optimal position of each particle;
S2, triggering an iteration every time a request comes, combining the current task attribute and available resources of each edge server, and according to the fitness functionEvaluating the fitness of each particle, wherein K is the number of local vehicles, +.>Representing the maximum tolerable delay of a task, < >>For migration delay i A penalty factor is indicated and is indicated,g is a timeout penalty coefficient;
s3, updating the individual optimal position of each particle: if the particle current position fitness value is better than the particle historical optimal position fitness value, setting the current position as a new individual optimal position;
s4, updating the global optimal position: in the individual optimal positions of all particles, selecting the position with the optimal fitness value as the global optimal position, simultaneously taking the group optimal position after the iterative updating of the round as the solution of the request, analyzing the group optimal position code to know the serial number of the edge server selected by each task in the current request, and realizing the edge server selection of the particle swarm algorithm based on the set;
s5, according toAnd->Updating particle velocity and position, wherein +.>Representing the position of the particle i at the t-th iteration; />And->The speeds of the particles i at t+1 iterations and t iterations respectively; / >Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; cp is an individual learning factor, represents the local optimal propulsion force of particles to individual cognition, cg is a group learning factor, and represents the global optimal propulsion force of particles to group cognition; random, random is a random number with a value between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge to the global optimal position, and the current position of the particles is determined by the last positionTogether with the velocity, the particle location update procedure is expressed as:
x i t+1 =x i t +v i t
s6, judging whether a new request is generated, if no request exists, ending the algorithm, otherwise, returning to the step S2.
The method has the advantages that the aggregate-based particle swarm algorithm is used for selecting the unloaded target edge servers, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the edge servers providing the service is optimized, and the requirements of completing the computationally intensive task for low time delay and relieving the excessive load of computational resources are met.
Further, the method for establishing the objective function comprises the following steps:
Total time delay of individual tasksAnd total energy consumption->The method comprises the following steps:
wherein,time and energy consumption for local execution; />Time delay and energy consumption for offloading tasks to an edge server; />Time delay and energy consumption for offloading tasks to a cloud server;
taking the trade-off of time delay and energy consumption as an optimization target, and defining tasksIs->The method comprises the following steps:
wherein,and->The time delay weight and the energy consumption weight are respectively expressed, and can be dynamically adjusted according to the requirements of different tasks on time delay and energy consumption;
the optimization objective is to minimize the overhead of tasks generated during the time of K vehicles T, expressed as:
s.t C1:
C2:
C3:
C4:
C5:
C6:
wherein,respectively are provided withB represents the ratio of task offloading to local, edge and cloud servers i,j ,Cpt i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; similarly, constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area, not exceeding its own maximum communication resource; constraint C6 represents the range of values for the delay and the energy consumption weight.
And an objective function is established, multiple constraints are set, and subsequent task unloading and resource allocation solving are facilitated.
Further, the method of building the state space is as follows:
state space s t The method comprises the following steps:
s t ={Task(t),F edge (t),F local (t),R(t)}
wherein,representing the computing resources available to all edge servers at the current time, f j eage Computing power for the corresponding edge server; />Indicating the computing resources available to all local vehicles at the current moment,/->Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix i,j Representing the ith local vehicleMaximum data uplink transmission rate with the jth edge server, if the vehicle is not within the coverage area of the jth base station, r i,j =0。
The global observation of the decision maker server forms a state space, the state space contains a task information set at the current moment, communication resources available to each edge server, computing resources of all local vehicles and the maximum data uploading rate between all local vehicles and the edge servers are realized, and long-term system overhead is minimized.
Further, the method for constructing the action space comprises the following steps:
each action a in the action space t The definition is as follows:
wherein,an offload proportion decision representing the ith task, < +. >And->Communication resource allocation decisions and computing resource allocation decisions respectively representing the ith edge server;
related actions for operating the edge server without residual communication resources and computing resources are removed from the action space, so that an effective action space is obtained;
after introducing an action space, when selecting an action by using an epsilon-greedy strategy, judging whether the action belongs to an effective action space or not, and if the action is not in the effective action space, re-selecting; meanwhile, in each time interval, the effective action space is updated in combination with the resource condition of the edge server.
After the current state selects to execute a certain action in the action space, the current state is converted into a new state in the state space, and the environment gives rewards to the agent according to the established function so as to guide the selection of the follow-up action.
Further, the method of constructing and solving the bonus function is as follows:
the optimization objective is to minimize the time delay and energy consumption of the task, minimize the system overhead and optimize the success rate of the task, thus rewarding the function R t The definition is as follows:
wherein,a task representing an ith vehicle at a t-th time slot; k is the number of local vehicles;for tasks->Is a system overhead of (1);
And solving a task unloading decision and a resource allocation process under a multi-constraint condition by adopting a PERD DEQN algorithm:
random initialization of current Q network parameters θ t Initializing parameter θ 'of target network Q' t =θ t Initializing an effective action space and initializing an exploration rate epsilon;
starting from epoode=1, to training round M, first initializing the current state;
starting from t=1, the number of time steps T to each round, starts to execute:
selecting actions according to the current state by using an epsilon-greedy strategy, and when the selected actions are not in the effective action space, reselecting the actions until the actions in the effective action space are selected;
performing an action, observing the prize r and the next state s';
saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting initial priorityw i And updating the current state s=s';
starting from j=0, to miniband K, execution starts:
(1) Sampling is carried out from a priority experience playback pool according to the set weight, and samples with large weights are preferentially selected;
(2) Action a of obtaining maximum Q value based on current network * Calculating the Q value Q '(s', a) of the target network * ;θ t ) Setting a target value Y i PERDDQN =r i +γQ′(s′,a * ;θ t ) Wherein r is i For rewards, γ is a rewards discount factor;
(3) Calculating the difference delta between the target Q value and the current Q network estimated value i =Y i PERDDQN -Q(s,a;θ t ) The larger the TD error is, the larger the counter-propagation effect is, and the faster the training speed of network parameters can be;
(4) Setting sampling probability and priority weight of empirical data according to TD errorAnd updating the sample priority, wherein P i Is sampling probability, beta is sampling weight coefficient, w i Is the priority weight, delta i Is a TD error;
after adding priority to the experience pool, the loss function is adjusted as follows:where M is the number of edge servers;
calculating gradientsAnd updates the current Q network parameters in a gradient descent method, wherein,is Q value to theta t Is a derivative of (2);
and updating the effective action space according to the computing resource and the communication resource residual condition of each edge server in the environment and periodically updating the target network parameters according to the frequency F.
The optimization problem aims at minimizing the long-term overhead of the system, and the final goal of the deep reinforcement learning is to maximize the long-term expected rewards, so that the magnitude of the rewards is set in negative correlation with the system overhead.
The invention also provides a terminal edge cloud system architecture, which comprises a local layer, an edge layer and a cloud layer which are sequentially in communication connection;
The local layer comprises a plurality of vehicles, and the vehicles carry dual-mode communication modules;
the edge layer comprises a base station, a road side unit and an edge processor; the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server;
the cloud layer comprises a cloud server, and the cloud server executes the task unloading and resource allocation optimizing method and optimizes task unloading and resource allocation.
By utilizing the framework, task unloading and resource allocation optimization are realized, the algorithm learning efficiency is high, and the convergence speed is high.
Drawings
FIG. 1 is a flow diagram of a task offloading and resource allocation optimization method of the present invention;
fig. 2 is a schematic structural diagram of a terminal cloud system architecture according to the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.
The invention discloses a method for establishing a terminal edge cloud system architecture, which comprises the following steps:
deploying a Road Side Unit (RSU) and base stations (eNB) along a single straight channel, spacing adjacent base stations by L meters, and configuring an edge server, wherein the signal coverage radius of each base station is L/2;
The vehicle and the road side unit realize data interaction and establish connection with the base station; the vehicle is provided with a PC5/UU dual-mode communication module, realizes data interaction with an RSU (Road Side Unit) through a PC5 mode, and establishes connection with an eNB through a UU port;
the system comprises K local vehicles, M edge servers (not including a decision maker server) and a cloud server, wherein each vehicle randomly generates a task in each time intervalThe method comprises the following steps:
wherein b i The unit is bit (bit) which is the task data volume; c i The amount of computation required for the task unit data, in cycles/bit,tolerance time delay for the task; i denotes an i-th vehicle, t denotes a t-th time slot; task types include general, computationally intensive, data intensive, and latency sensitive. The preferences of the tasks on time delay and energy consumption are different, the time delay sensitive task is more important to optimize the time delay, and timely response of the tasks is guaranteed, so that the time delay weight is higher. The data-intensive task and the computation-intensive task respectively generate a large amount of energy consumption in the data transmission and computation processes, and the two tasks have wider requirements on time delay and higher energy consumption preference compared with the two tasks.
The vehicle task offloading dividing ratio and the remaining calculation and communication resources of the MEC server affect the task offloading selection to the MEC server, and when the vehicle is divided into the portions offloaded to the MEC server, the selection of the target server for task execution needs to be globally made, and this portion affects the task offloading transmission delay, thereby increasing the total delay of offloading to the MEC server execution. The task quantity with different sizes has different task tolerance time delay, and the tasks which exceed the tolerance time delay and are not completed are defined as failed tasks, so that the task success rate is defined: the total number of tasks/tasks that are successfully completed within the task tolerance time delay is reached. The time slot t local Task set Task (t) is:
in each time interval, all vehicles send own states to a decision maker server (a single edge server is used as a decision making server), wherein the own states of the vehicles comprise information such as vehicle computing capacity, position and the like, task information randomly generated by the vehicles at the current moment and the like;
after collecting the available computing resources and communication resource conditions of all edge servers in the end-edge cloud system at the current moment and state information (the state information is the conditions of the available computing resources and communication resources of the edge servers in the system and the conditions of the generating tasks of vehicles and the local computing resources), a decision maker server selects action decisions based on the current state;
After the decision maker server makes action decisions, the result is returned to the local vehicle, the local vehicle executes the tasks by referring to the result sent by the decision maker server, the tasks are unloaded to the corresponding computing units, the system overhead is counted, and meanwhile rewards are computed and returned to the decision maker server.
The invention also provides a task unloading and resource allocation optimizing method, which takes the weighted sum of the minimum time delay and the energy consumption as an optimizing target, combines the task unloading and the resource allocation of the end-edge cloud system, and considers the dynamic characteristics of different vehicle-mounted tasks, the conditions of communication resources, computing resource states and the like. And (3) providing a task unloading and task migration model, solving a task migration problem by adopting a set-based particle swarm optimization (SPSO), and solving a decision problem of task unloading and resource allocation by adopting a deep reinforcement learning framework PERDDQN (deep reinforcement learning per-ddqn algorithm with preferential playback) so as to meet the requirement of minimizing the long-term overhead of the system.
As shown in fig. 1, the task offloading and resource allocation optimization method includes the steps of:
according to the establishment method, an end-side cloud system architecture is established, and in each time interval, all vehicles send task information randomly generated by the vehicles at the current moment and the state of the vehicles to a decision maker server, wherein the types of the tasks are common, computationally intensive, data intensive or time delay sensitive;
Based on the characteristics of different calculation tasks, taking the bias factors of time delay and energy consumption into consideration, introducing a load balancing factor, calculating the task migration cost, and constructing a task unloading model and a task migration model;
establishing an objective function by using a task unloading model and a task migration model;
constructing a state space and an action space of one of a deep reinforcement learning algorithm, a game theory, a genetic algorithm, an ant colony algorithm or Lyapunov (Lyapunov) optimization;
and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization.
In a preferred scheme of the invention, the task unloading model describes the time delay and the processing energy consumption for unloading a task to different equipment to execute, and the task unloading model consists of three parts, wherein one part is the local calculation time delay and the time delay of the task, the other part is the calculation time delay and the energy consumption on a server through a wireless network and the other part is the calculation time delay and the energy consumption which are unloaded to a cloud server. The method for constructing the task unloading model comprises the following steps:
the time delay and the processing energy consumption for unloading a task to different devices for execution comprise the time delay and the energy consumption of local calculation of the task; unloading to an MEC server through a wireless network, wherein the calculation time delay and the energy consumption of the MEC on the server are calculated; and time delay and energy consumption of unloading to the cloud server;
Task local calculation time T i local And energy consumptionThe method comprises the following steps of:
wherein,representing the proportion of task offloading to local, edge server and cloud server, respectively, +.>Representing the magnitude of the computing power of the vehicle; k (k) local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b i For the amount of task data c i The amount of computation required for the task unit data;
in the end-edge cloud architecture, the local vehicle communicates wirelessly with an edge server. Irrespective of the communication interference between the local vehicle and the edge server, a maximum upstream data rate r between the local vehicle and the edge server for task i is generated, which is available according to shannon's formula i,j The method comprises the following steps:
wherein W represents the channel bandwidth of the base station, p i Signal transmission power g for local vehicle i Sigma for channel gain of local vehicle and edge server 2 Is a gaussian white noise power in a wireless communication environment;
the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T i up,eage Processing time delay T i exe,edge And a result return delay. Since the amount of task result data is much smaller than the amount of data uploaded, the result return delay to return the calculation result from the edge server to the local vehicle is negligible. Similar to the computation delay, the energy consumption of the task offloading to the MEC edge server is the energy consumption of the data uploading Treatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server i edge And energy consumption->Expressed as:
wherein B is i,j ,Cpt i,j For allocation to tasksRespectively representing the bandwidth proportion and the computing resource proportion of the j-th edge server allocated to the task iCommunication and computing resource duty cycles; k (k) edge Is the inherent energy consumption coefficient of CPU of the edge server, f j edge Computing power for the corresponding edge server;
when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of the transfer of the MEC edge server, and then uploaded to the cloud server by the MEC edge server, the physical position of the cloud server is far away and has massive computing resources, the transmission Delay from the edge server to the cloud server is regarded as a Fixed value (FD), the executed Delay and the energy consumption generated by the wired communication are negligible, and the Delay of the task unloading to the cloud serverAnd energy consumption->Expressed as:
wherein,and (5) time delay for uploading the data to the cloud server.
In a preferred scheme of the invention, the specific method for constructing the task migration model is as follows:
after the vehicle offloads the task to the MEC server beside the BS, the BS decides whether to migrate the computing task to the MEC server beside other BSs (base station), and because the situation that the MEC server currently accessed by the vehicle is insufficient in remaining communication resources and computing resources, the problem of which server is the object of offloading needs to be solved for each task. Since the task offloaded from the vehicle to the MEC server is already part of the task, it is in progress When the row task is migrated, the task is not split any more, but the whole task is migrated, and in the migration process, the migration time delay is prolongedThe method comprises the following steps:
wherein,representing the upload delay to the nearest edge server,/-for the most recent edge server>Transmission delay representing offloading of tasks to target edge server according to particle swarm algorithm, +.>Representing the time delay spent executing.
In a preferred scheme of the invention, a collection-based particle swarm optimization (SPSO) is adopted for selecting a server during task unloading:
setting the particle position code to represent the edge server selection scheme of all the current tasks, which is recorded asIf there are m edge servers in total, x is i J (1. Ltoreq.j.ltoreq.m) represents that the ith task selects j edge servers as unloading objects of the edge layer, and the speed of the particles represents the trend of the current task selecting other edge servers for task unloading;
the method comprises the following specific steps:
s1, initializing a particle swarm, wherein the particle swarm comprises a particle swarm scale, a speed and a position of each particle, and simultaneously initializing an individual optimal position and a global optimal position of each particle;
s2, triggering an iteration every time a request comes, combining the current task attribute and available resources of each edge server, and according to the adaptability Function ofEvaluating the fitness of each particle, wherein K is the number of local vehicles, +.>Representing the maximum tolerable delay of a task, < >>For migration delay i A penalty factor is indicated and is indicated,g is a timeout penalty coefficient;
s3, updating the individual optimal position of each particle: if the particle current position fitness value is better than the particle historical optimal position fitness value, setting the current position as a new individual optimal position;
s4, updating the global optimal position: in the individual optimal positions of all particles, selecting the position with the optimal fitness value as the global optimal position, simultaneously taking the group optimal position after the iterative updating of the round as the solution of the request, analyzing the group optimal position code to know the serial number of the edge server selected by each task in the current request, and realizing the edge server selection of the particle swarm algorithm based on the set;
s5, according toAnd->Updating particle velocity and position, wherein +.>Representing the position of the particle i at the t-th iteration; />And->The speeds of the particles i at t+1 iterations and t iterations respectively; />Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; cp is an individual learning factor, represents the local optimal propulsion force of particles to individual cognition, cg is a group learning factor, and represents the global optimal propulsion force of particles to group cognition; random, random is a random number with a value between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge on the global optimal position, the current position of the particles is determined by the position and the speed of the last time, and the particle position updating process is expressed as follows:
x i t+1 =x i t +v i t
S6, judging whether a new request is generated, if no request exists, ending the algorithm, otherwise, returning to the step S2.
In a preferred embodiment of the present invention, the method for establishing the objective function includes:
total time delay of individual tasksAnd total energy consumption->The method comprises the following steps: />
Wherein,time and energy consumption for local execution; />Time delay and energy consumption for offloading tasks to an edge server; />Time delay and energy consumption for offloading tasks to a cloud server; the total delay includes the maximum of the calculation delay of the local execution of the task, the uploading delay of the task to the MEC (edge) server and the calculation delay (if the SPSO algorithm determines that the target MEC server executing the task is not the MEC server currently located, and the transmission delay between the MEC servers needs to be added), the uploading delay of the task to the cloud server and the calculation delay. The total energy consumption includes a sum of the calculated energy consumption and the transmission energy consumption.
Taking the trade-off of time delay and energy consumption as an optimization target, and defining tasksIs->The method comprises the following steps:
wherein,and->The time delay weight and the energy consumption weight are respectively expressed, and can be dynamically adjusted according to the requirements of different tasks on time delay and energy consumption;
the optimization objective is to minimize the overhead of tasks generated during the time of K vehicles T, expressed as:
s.t C1:
C2:
C3:
C4:
C5:
C6:
Wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, B i,j ,Cpt i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; similarly, constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area, not exceeding its own maximum communication resource; constraint C6 represents the range of values for the delay and the energy consumption weight.
To minimize long-term overhead, the offloading proportions of local vehicle tasks and the MEC server communication resources and computing resource allocations are comprehensively decided, so that the proposed vehicle part computing task offloading problem is solved using a deep reinforcement learning algorithm. First, the problem is defined as a Markov Decision Process (MDP), and then solved by using a PERD DEQN algorithm to explore a more efficient and stable task offloading strategy, so as to cope with more complex and changeable traffic scenes.
An edge server is used as a decision maker and is used as an agent (agent) in DRL (deep reinforcement learning), so that a state space, an action space and a rewarding function in reinforcement learning are defined.
In a preferred embodiment of the present invention, the state space contains a set of task information at the current time, the available communication resources of each MEC server, the computing resources of all local vehicles and the maximum data upload rate between all local vehicles and the MEC server. The task information also includes data volume, computation volume, and maximum tolerable delay. The action space contains the task offload ratio, the communication resources and the computing resource allocation of the ith MEC server. The method for constructing the state space of the deep reinforcement learning algorithm is as follows:
to minimize long-term overhead, the state space contains factors that affect task offloading decisions, mainly task own attributes and system available resources. State space s t The method comprises the following steps:
s t ={Task(t),F edge (t),F local (t),R(t)}
wherein,indicating the computing resources available to all edge servers at the current moment,/->Computing power for the corresponding edge server; />Indicating the computing resources available to all local vehicles at the current moment,/->Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix i,j Represents the ithMaximum data uplink transmission rate between local vehicle and jth edge server, if vehicle is not in coverage of jth base station, r i,j =0。
In a preferred scheme of the invention, the method for constructing the action space of the deep reinforcement learning algorithm comprises the following steps:
each action a in the action space t The definition is as follows:
wherein,an offload proportion decision representing the ith task, < +.>And->Communication resource allocation decisions and computing resource allocation decisions respectively representing the ith edge server;
the solution to the action space is the solution to the next unloading decision and the communication and computing resource allocation decision, and the sum of the communication resources allocated to all tasks by the edge server cannot exceed the available communication resources thereof, and the sum of the computing resources allocated to all tasks cannot exceed the available computing resources thereof due to the limited communication resources and computing resources of the edge server. Related actions for operating the edge server without residual communication resources and computing resources are removed from the action space, so that an effective action space is obtained;
after introducing an action space, when selecting an action by using an epsilon-greedy strategy, judging whether the action belongs to an effective action space or not, and if the action is not in the effective action space, re-selecting; meanwhile, in each time interval, the effective action space is updated in combination with the resource condition of the edge server.
In a preferred scheme of the invention, in order to meet the requirement of improving the success rate of the task, namely, under the condition that the total time delay of the task is smaller than the maximum time delay tolerated by the task, the long-term overhead of the system is optimized through decision of task unloading and resource allocation, and the system overhead is a weighted sum of the total time delay and the total energy consumption. The optimization problem aims at minimizing the long-term overhead of the system, and the final goal of the deep reinforcement learning is to maximize the long-term expected rewards, so that the magnitude of the rewards is set in negative correlation with the system overhead.
Problem optimization aims at minimizing the time delay and energy consumption of tasks, minimizing the system overhead and optimizing the success rate of tasks. The method for constructing and solving the reward function is as follows:
the optimization objective is to minimize the time delay and energy consumption of the task, minimize the system overhead and optimize the success rate of the task, thus rewarding the function R t The definition is as follows:
wherein,a task representing an ith vehicle at a t-th time slot; k is the number of local vehicles;for tasks->Is a system overhead of (1);
and solving a task unloading decision and a resource allocation process under a multi-constraint condition by adopting a PERD DEQN algorithm:
randomly initializing Q network parameters θ in a current DQN (deep Q network) tt Is a parameter of the network in deep reinforcement learning, which is adjusted by gradient descent), and the parameter θ ' of the target network Q ' is initialized ' t =θ t (based on the DDQN (double DQN) model, a double-layer Q network, so the second parameter is that of the second Q network), initializing the effective action space, and initializing the exploration rate epsilon;
starting from epoode=1 (epoode is a term of machine learning, M rounds of training, i.e., M epoode at a time), to training round M, the current state is initialized first;
starting from t=1, the number of time steps T to each round, starts to execute:
selecting actions according to the current state by using an epsilon-greedy strategy, and when the selected actions are not in the effective action space, reselecting the actions until the actions in the effective action space are selected; epsilon-greedy strategy is a strategy for reinforcement learning exploration based on current state and actions, exploration: meaning that the agent selects other unknown actions beyond the known (state-action) binary composition. The epsilon-greedy strategy is a common strategy, which indicates that when an agent makes a decision, a small positive probability is given to randomly select an unknown action, and the rest probability is given to select the action with the highest action value in the past.
Performing an action, observing the prize r and the next state s';
saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting an initial priority w i And updating the current state s=s'; s, a, r, s' are respectively states, actions, rewards and the next time state;
starting from j=0, to miniband K (miniband is a machine learning term where the batch of each epoode sample is K), execution begins:
(1) Sampling is carried out from a priority experience playback pool according to the set weight, and samples with large weights are preferentially selected;
(2) Action a of obtaining maximum Q value based on current network * Calculating the Q value Q '(s', a) of the target network * ;θ t ) Setting a target value Y i PERDDQN =r i +γQ′(s′,a * ;θ t ) Wherein r is i For rewards, γ is a rewards discount factor;
(3) Calculating a difference (TD error) delta between the target Q value and the current Q network estimate i =Y i PERDDQN -Q(s,a;θ t ) The larger the TD error is, the larger the counter-propagation effect is, and the faster the training speed of network parameters can be;
(4) Setting sampling probability and priority weight of empirical data according to TD errorAnd updating the sample priority, wherein P i Is sampling probability, beta is sampling weight coefficient, w i Is the priority weight, delta i Is a TD error;
after adding priority to the experience pool, the loss function is adjusted as follows:where M is the number of edge servers; />
Calculating gradientsAnd updates the current Q network parameters in a gradient descent method, wherein,is Q value to theta t Is a derivative of (2);
and updating the effective action space according to the computing resource and the communication resource residual condition of each edge server in the environment and periodically updating the target network parameters according to the frequency F.
According to the invention, optimization research is carried out on task unloading and resource allocation strategies under a terminal edge cloud architecture, and the task information of the vehicle and the information of the MEC edge server cluster are collected; these elements are used to construct a state space, a target MEC server is selected according to the SPSO algorithm, noise is added to search a neural network, and an action in an effective action space is determined. And compared with the DRL algorithms such as DDPG, DQN and the like, the PERDDQN algorithm has the advantages of double-Q-value network and delayed updating, so that the network training achieves the effects of faster convergence, higher success rate and more accuracy.
In the PERDDQN algorithm, a dual-neural network mechanism is used for decoupling the selection and evaluation of actions, a relatively smaller value is selected as a network updating target, the influence of overestimation of the target Q value on the estimation of the target Q value in the current Q network is avoided, and the problem of overestimation of the Q value is effectively solved. Meanwhile, the PERDDQN algorithm introduces a concept of priority to samples in the experience pool, and sets priority for each sample according to the absolute value of the TD error of the sample. The concept of effective action space is introduced, nonsensical action selection is reduced, and the learning of an algorithm on important samples is enhanced, so that the algorithm convergence is promoted.
According to the technical scheme, the calculation models of time delays and energy consumption of different layers of tasks under the end-edge cloud architecture are defined, meanwhile, the situation that the currently accessed MEC server of the vehicle cannot be served is considered, the SPSO algorithm is used for selecting the unloaded target MEC server, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the MEC servers providing the service is optimized, and the requirements of completing the low time delay of the computationally intensive task and relieving the excessive load of the computing resources are met.
The invention also provides a terminal edge cloud system architecture, which is shown in figure 2 and comprises a local layer, an edge layer and a cloud layer which are sequentially in communication connection; the local layer comprises a plurality of vehicles, and the vehicles carry dual-mode communication modules; the edge layer comprises a base station, a road side unit and an edge processor; the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server; the cloud layer comprises a cloud server, and the cloud server executes the task unloading and resource allocation optimizing method and optimizes task unloading and resource allocation. By utilizing the framework, task unloading and resource allocation optimization are realized, the algorithm learning efficiency is high, and the convergence speed is high.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A method of establishing a terminal edge cloud system architecture, comprising the steps of:
the method comprises the steps of deploying a road side unit and base stations along a single line straight channel, spacing adjacent base stations by L meters, enabling the signal coverage radius of each base station to be L/2, and providing an edge server;
The vehicle and the road side unit realize data interaction and establish connection with the base station;
the system comprises K local vehicles, M edge servers and a cloud server, wherein each vehicle randomly generates a task in each time intervalThe method comprises the following steps:
wherein b i For the amount of task data c i The amount of computation required for the task unit data,tolerance time delay for the task; i denotes an i-th vehicle, t denotes a t-th time slot;
the time slot t local Task set Task (t) is:
in each time interval, all vehicles send their own states to a decision maker server;
the decision maker server collects the conditions and state information of available computing resources and communication resources of all edge servers in the end-edge cloud system at the current moment, and then selects action decisions based on the current state;
after the decision maker server makes action decisions, the result is returned to the local vehicle, the local vehicle executes the tasks by referring to the result sent by the decision maker server, the tasks are unloaded to the corresponding computing units, the system overhead is counted, and meanwhile rewards are computed and returned to the decision maker server.
2. A method for task offloading and resource allocation optimization, comprising the steps of:
the method of claim 1, wherein a terminal cloud system architecture is established, and in each time interval, all vehicles send task information randomly generated by the vehicles at the current moment and the state of the vehicles to a decision maker server, wherein the types of the tasks are common, computationally intensive, data intensive or time delay sensitive;
Based on the characteristics of different calculation tasks, taking the bias factors of time delay and energy consumption into consideration, introducing a load balancing factor, calculating the task migration cost, and constructing a task unloading model and a task migration model;
establishing an objective function by using a task unloading model and a task migration model;
constructing a state space and an action space;
and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization.
3. The task offloading and resource allocation optimization method of claim 2, wherein the method of constructing a task offloading model is:
the time delay and the processing energy consumption for unloading a task to different devices for execution comprise the time delay and the energy consumption of local calculation of the task; unloading to an MEC server through a wireless network, wherein the calculation time delay and the energy consumption of the MEC on the server are calculated; and time delay and energy consumption of unloading to the cloud server;
task local calculation time T i local And energy consumptionThe method comprises the following steps of: />
Wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, f i local Representing the magnitude of the computing power of the vehicle; k (k) local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b i For the amount of task data c i The amount of computation required for the task unit data;
the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T i up,edge Processing time delay T i exe,edge And the result is returned and delayed, and the energy consumption of the task unloading to the MEC edge server is uploaded by the dataTreatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server i edge And energy consumptionExpressed as:
wherein B is i,j ,Cpt i,j For allocation to tasksThe bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; k (k) edge Is the inherent energy consumption coefficient of CPU of the edge server, f j edge Computing power for the corresponding edge server; r is (r) i,j Maximum uplink data rate between the local vehicle and the edge server for task i; p is p i Signal transmitting power for the local vehicle;
when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of transfer of the MEC edge server, then the MEC edge server is uploaded to the cloud server, the transmission delay from the edge server to the cloud server is regarded as a fixed value FD, and the delay of the task unloading to the cloud server And energy consumption->Expressed as:
wherein T is i up,cloud And (5) time delay for uploading the data to the cloud server.
4. The task offloading and resource allocation optimization method of claim 2, wherein the specific method of constructing the task migration model is as follows:
performing overall task migration, wherein in the migration process, the migration time delay is prolongedThe method comprises the following steps:
wherein,representing the upload delay to the nearest edge server,/-for the most recent edge server>Transmission delay representing offloading of tasks to target edge server according to particle swarm algorithm, +.>Representing the time delay spent executing.
5. The method for task offloading and resource allocation optimization of claim 2, wherein a set-based particle swarm algorithm is used for selection of a server at task offloading:
setting the particle position code to represent the edge server selection scheme of all the current tasks, which is recorded asIf there are m edge servers in total, x i J (1. Ltoreq.j.ltoreq.m) represents that the ith task selects j edge servers as the object of unloading for the edge layer, the speed of the particles represents the current task selecting other edge servers for task unloadingTrend size of the load;
the method comprises the following specific steps:
s1, initializing a particle swarm, wherein the particle swarm comprises a particle swarm scale, a speed and a position of each particle, and simultaneously initializing an individual optimal position and a global optimal position of each particle;
S2, triggering an iteration every time a request comes, combining the current task attribute and available resources of each edge server, and according to the fitness functionEvaluating the fitness of each particle, wherein K is the number of local vehicles, +.>Representing the maximum tolerable delay of a task, < >>For migration delay i A penalty factor is indicated and is indicated,g is a timeout penalty coefficient;
s3, updating the individual optimal position of each particle: if the particle current position fitness value is better than the particle historical optimal position fitness value, setting the current position as a new individual optimal position;
s4, updating the global optimal position: in the individual optimal positions of all particles, selecting the position with the optimal fitness value as the global optimal position, simultaneously taking the group optimal position after the iterative updating of the round as the solution of the request, analyzing the group optimal position code to know the serial number of the edge server selected by each task in the current request, and realizing the edge server selection of the particle swarm algorithm based on the set;
s5, according toAnd x i t +1 =x i t +v i t Updating particle velocity and position, where x i t Representing the position of the particle i at the t-th iteration; v i t+1 And v i t The speeds of the particles i at t+1 iterations and t iterations respectively; ωv i t Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; c p For individual learning factors, representing the locally optimal propulsion of particles to individual cognition, c g The population learning factors represent the global optimal propulsion force of particles to population cognition; rand of p ,rand g Random numbers which are all between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge on the global optimal position, the current position of the particles is determined by the position and the speed of the last time, and the particle position updating process is expressed as follows:
x i t+1 =x i t +v i t
s6, judging whether a new request is generated, if no request exists, ending the algorithm, otherwise, returning to the step S2.
6. The task offloading and resource allocation optimization method of claim 2, wherein the method of establishing the objective function is:
total time delay of individual tasksAnd total energy consumption->The method comprises the following steps:
wherein,time and energy consumption for local execution; t (T) i edge ,/>Time delay and energy consumption for offloading tasks to an edge server; t (T) i cloud ,/>Time delay and energy consumption for offloading tasks to a cloud server;
Taking the trade-off of time delay and energy consumption as an optimization target, and defining tasksIs->The method comprises the following steps:
wherein,and->The time delay weight and the energy consumption weight are respectively expressed, and can be dynamically adjusted according to the requirements of different tasks on time delay and energy consumption;
the optimization objective is to minimize the overhead of tasks generated during the time of K vehicles T, expressed as:
wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, B i,j ,Cpt i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area that does not exceed its own maximum communication resource; constraint C6 represents the value of time delay and energy consumption weight Range.
7. The task offloading and resource allocation optimization method of claim 2, wherein the method of building the state space is as follows:
state space s t The method comprises the following steps:
s t ={Task(t),F edge (t),F local (t),R(t)}
wherein,representing the computing resources available to all edge servers at the current time, f j edge Computing power for the corresponding edge server;
representing the computing resources available to all local vehicles at the current time, f i local Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix i,j Indicating the maximum data uplink transmission rate between the ith local vehicle and the jth edge server, if the vehicle is not within the coverage of the jth base station, r i,j =0。
8. The task offloading and resource allocation optimization method of claim 2, wherein the method of building an action space is:
each action a in the action space t The definition is as follows:
wherein,an offload proportion decision representing the ith task, < +.>And->Communication resource allocation decisions and computing resource allocation decisions respectively representing the ith edge server;
related actions for operating the edge server without residual communication resources and computing resources are removed from the action space, so that an effective action space is obtained;
After introducing an action space, when selecting an action by using an epsilon-greedy strategy, judging whether the action belongs to an effective action space or not, and if the action is not in the effective action space, re-selecting; meanwhile, in each time interval, the effective action space is updated in combination with the resource condition of the edge server.
9. The task offloading and resource allocation optimization method of claim 2, wherein the method of constructing and solving the reward function is as follows:
the optimization objective is to minimize the time delay and energy consumption of the task, minimize the system overhead and optimize the success rate of the task, thus rewarding the function R t The definition is as follows:
wherein,a task representing an ith vehicle at a t-th time slot; k is the number of local vehicles;for tasks->Is a system overhead of (1);
and solving a task unloading decision and a resource allocation process under a multi-constraint condition by adopting a PERD DEQN algorithm:
random initialization of current Q network parameters θ t Initializing parameter θ 'of target network Q' t =θ t Initializing an effective action space and initializing an exploration rate epsilon;
starting from epoode=1, to training round M, first initializing the current state;
starting from t=1, the number of time steps T to each round, starts to execute:
Selecting actions according to the current state by using an epsilon-greedy strategy, and when the selected actions are not in the effective action space, reselecting the actions until the actions in the effective action space are selected;
performing an action, observing the prize r and the next state s';
saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting an initial priority w i And updating the current state s=s';
starting from j=0, to miniband K, execution starts:
(1) Sampling is carried out from a priority experience playback pool according to the set weight, and samples with large weights are preferentially selected;
(2) Action a of obtaining maximum Q value based on current network * Calculating the Q value Q '(s', a) of the target network * ;θ t ) Setting a target value Y i PERDDQN =r i +γQ′(s′,a * ;θ t ) Wherein r is i For rewards, γ is a rewards discount factor;
(3) Calculating the difference delta between the target Q value and the current Q network estimated value i =Y i PERDDQN -Q(s,a;θ t ) The larger the TD error is, the larger the counter-propagation effect is, and the faster the training speed of network parameters can be;
(4) Setting sampling probability and priority weight w of empirical data according to TD error i =(K·P i ) And update the samplePriority, where P i Is sampling probability, beta is sampling weight coefficient, w i Is the priority weight, delta i Is a TD error;
after adding priority to the experience pool, the loss function is adjusted as follows:where M is the number of edge servers;
calculating gradientsAnd updates the current Q network parameters in a gradient descent method, wherein,is Q value to theta t Is a derivative of (2);
and updating the effective action space according to the computing resource and the communication resource residual condition of each edge server in the environment and periodically updating the target network parameters according to the frequency F.
10. The end-edge cloud system architecture is characterized by comprising a local layer, an edge layer and a cloud layer which are sequentially in communication connection;
the local layer comprises a plurality of vehicles, and the vehicles carry dual-mode communication modules;
the edge layer comprises a base station, a road side unit and an edge processor;
the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server;
the cloud layer comprising a cloud server performing the method of one of claims 2-9, optimizing task offloading and resource allocation.
CN202310780228.6A 2023-06-29 2023-06-29 Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture Pending CN117528649A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310780228.6A CN117528649A (en) 2023-06-29 2023-06-29 Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310780228.6A CN117528649A (en) 2023-06-29 2023-06-29 Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture

Publications (1)

Publication Number Publication Date
CN117528649A true CN117528649A (en) 2024-02-06

Family

ID=89751958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310780228.6A Pending CN117528649A (en) 2023-06-29 2023-06-29 Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture

Country Status (1)

Country Link
CN (1) CN117528649A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117793801A (en) * 2024-02-26 2024-03-29 北京理工大学 Vehicle-mounted task unloading scheduling method and system based on hybrid reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117793801A (en) * 2024-02-26 2024-03-29 北京理工大学 Vehicle-mounted task unloading scheduling method and system based on hybrid reinforcement learning
CN117793801B (en) * 2024-02-26 2024-04-23 北京理工大学 Vehicle-mounted task unloading scheduling method and system based on hybrid reinforcement learning

Similar Documents

Publication Publication Date Title
CN111556461B (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN112995913A (en) Unmanned aerial vehicle track, user association and resource allocation joint optimization method
CN112911648A (en) Air-ground combined mobile edge calculation unloading optimization method
CN111885155B (en) Vehicle-mounted task collaborative migration method for vehicle networking resource fusion
CN117528649A (en) Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture
CN115297171B (en) Edge computing and unloading method and system for hierarchical decision of cellular Internet of vehicles
CN113645273B (en) Internet of vehicles task unloading method based on service priority
CN116390161A (en) Task migration method based on load balancing in mobile edge calculation
CN113641504A (en) Information interaction method for improving multi-agent reinforcement learning edge calculation effect
CN113709249B (en) Safe balanced unloading method and system for driving assisting service
CN113821346A (en) Computation uninstalling and resource management method in edge computation based on deep reinforcement learning
CN116009590B (en) Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
CN116634401A (en) Task unloading method for maximizing satisfaction of vehicle-mounted user under edge calculation
CN115766478A (en) Unloading method of air-ground cooperative edge computing server
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN114916013A (en) Method, system and medium for optimizing unloading time delay of edge task based on vehicle track prediction
CN115134242B (en) Vehicle-mounted computing task unloading method based on deep reinforcement learning strategy
Agbaje et al. Deep Reinforcement Learning for Energy-Efficient Task Offloading in Cooperative Vehicular Edge Networks
CN117544680B (en) Caching method, system, equipment and medium based on electric power Internet of things
CN115037751B (en) Unmanned aerial vehicle-assisted heterogeneous Internet of vehicles task migration and resource allocation method
Xu et al. Cooperative multi-player multi-armed bandit: Computation offloading in a vehicular cloud network
Zhang et al. A new dynamic clustering scheme for VANETs driven by deep reinforcement learning
CN114860345B (en) Calculation unloading method based on cache assistance in smart home scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination