CN117528649A

CN117528649A - Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture

Info

Publication number: CN117528649A
Application number: CN202310780228.6A
Authority: CN
Inventors: 曾令秋; 胡晗; 韩庆文; 雷瑜; 叶蕾
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2024-02-06

Abstract

The invention belongs to the technical field of intelligent transportation, and particularly discloses a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimization method and the end-edge cloud system architecture, wherein the task unloading and resource allocation optimization method controls all vehicles to send task information randomly generated by the vehicles in each time interval to a decision maker server, calculates task migration cost, establishes a task unloading model and a task migration model, and establishes an objective function; constructing a state space and an action space of a deep reinforcement learning algorithm; and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization. By adopting the technical scheme, a task unloading and task migration model is provided, a task migration problem is solved by adopting a particle swarm algorithm based on a set, and a decision problem of task unloading and resource allocation is solved by adopting a deep reinforcement learning algorithm, so that the requirement of minimizing the long-term overhead of the system is met, and the convergence speed is increased.

Description

Method for establishing end-edge cloud system architecture, task unloading and resource allocation optimization method and end-edge cloud system architecture

Technical Field

The invention belongs to the technical field of intelligent transportation, and relates to a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimizing method and the end-edge cloud system architecture.

Background

With the rapid development of intelligent traffic, novel vehicle-mounted applications and services such as intelligent auxiliary driving, real-time road conditions, image navigation and vehicle-mounted entertainment are introduced in a large quantity. Because of the fixed and limited computing and storage capabilities of local on-board devices, it is often difficult to meet the high latency and computing resource requirements of such tasks solely by local vehicle processing in the face of the explosive growth of intelligent on-board services. The introduction of multiple access edge computing (MEC) for vehicles under the end Bian Yun co-architecture provides powerful service capabilities for road vehicles.

The task unloading and resource allocation are always research hotspots and difficulties in the field of MEC and end-edge cloud cooperation, and the demands of the task on time delay and energy consumption can be better met by unloading the task to an edge server with rich resources and a cloud server for execution. Reasonable task offloading policies can improve network performance and quality of service (Quality of Service, qoS) of the end-edge cloud architecture.

The dynamics of the data requirements of the vehicle-mounted task, the dynamics of the calculation requirements and the dynamics of the available resources of the edge server provide challenges for the task offloading strategy. In the task unloading process, the limitation of bandwidth capacity and computational power resources is required to be fully considered so as to ensure that the calculation time delay and the communication transmission time delay can meet the response requirements of vehicle-mounted applications.

Therefore, the efficient execution of the task under the end-edge cloud architecture needs to fully analyze the vehicle-mounted application requirements on one hand, and on the other hand, needs to fully match the task unloading position, unloading proportion and resource allocation with the application requirements, namely, needs to study the task unloading position, unloading amount and communication resource allocation amount. In addition, when the computing resources and the communication resources of the current MEC server cannot meet the current task, the task is migrated to other MEC server computing processing, so that the task success rate can be improved, and the system overhead can be reduced.

In the prior art, the end-edge cloud system calculation unloading and resource allocation optimization scheme based on DQN (Deep Q-network, namely Q-Learning algorithm based on Deep Learning) has different bias of tasks on time delay and energy consumption, and DRL (Deep reinforcement Learning) methods such as DQN, Q-Learning, DDQN (Double DQN algorithm) and the like have the problems of over-estimation of the unavoidable Q value, slow convergence speed and the like, perform well in a large-state action space, and can bring higher estimation variance due to the use of a greedy strategy, and bootstrap can cause accumulated superposition of overestimate errors in the algorithm training process, so that the Q value estimation is further overlarge; and the methods do not adopt an experience playback mechanism based on priority, so that the algorithm learning efficiency is low and the convergence speed is low.

Disclosure of Invention

The invention aims to provide a method for establishing an end-edge cloud system architecture, a task unloading and resource allocation optimization method and the end-edge cloud system architecture, so as to obtain optimal decisions of task unloading and resource allocation and improve convergence speed.

In order to achieve the above purpose, the basic scheme of the invention is as follows: a method of establishing a terminal edge cloud system architecture, comprising the steps of:

the method comprises the steps of deploying a road side unit and base stations along a single line straight channel, spacing adjacent base stations by L meters, enabling the signal coverage radius of each base station to be L/2, and providing an edge server;

the vehicle and the road side unit realize data interaction and establish connection with the base station;

the system comprises K local vehicles, M edge servers and a cloud server, wherein each vehicle randomly generates a task in each time intervalThe method comprises the following steps:

wherein b _i For the amount of task data c _i The amount of computation required for the task unit data,tolerance time delay for the task; i denotes an i-th vehicle, t denotes a t-th time slot;

the time slot t local Task set Task (t) is:

in each time interval, all vehicles send their own states to a decision maker server;

the decision maker server collects the conditions and state information of available computing resources and communication resources of all edge servers in the end-edge cloud system at the current moment, and then selects action decisions based on the current state;

After the decision maker server makes action decisions, the result is returned to the local vehicle, the local vehicle executes the tasks by referring to the result sent by the decision maker server, the tasks are unloaded to the corresponding computing units, the system overhead is counted, and meanwhile rewards are computed and returned to the decision maker server.

The working principle and the beneficial effects of the basic scheme are as follows: according to the technical scheme, an end-edge cloud system architecture is established, multiple mobile devices and a multi-edge server are introduced, global observation is carried out by a decision maker server, action decision is carried out by combining the residual communication resources and the computing resources of the edge server, and the use is facilitated.

The invention also provides a task unloading and resource allocation optimizing method, which comprises the following steps:

according to the establishment method, an end-side cloud system architecture is established, and in each time interval, all vehicles send task information randomly generated by the vehicles at the current moment and the state of the vehicles to a decision maker server, wherein the types of the tasks are common, computationally intensive, data intensive or time delay sensitive;

based on the characteristics of different calculation tasks, taking the bias factors of time delay and energy consumption into consideration, introducing a load balancing factor, calculating the task migration cost, and constructing a task unloading model and a task migration model;

Establishing an objective function by using a task unloading model and a task migration model;

constructing a state space and an action space;

and constructing a reward function according to the optimization target, and solving the reward function to realize task unloading and resource allocation optimization.

According to the technical scheme, the calculation models of time delays and energy consumption of different layers of tasks under the end-to-end cloud architecture are defined, meanwhile, the situation that the edge server which is accessed by a vehicle at present cannot serve is considered, the unloaded target edge server is selected, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the edge servers which provide services is optimized, the requirements of low time delay for completing computation-intensive tasks and relieving excessive load of computing resources are met, and the convergence speed is high.

Further, the method for constructing the task unloading model comprises the following steps:

the time delay and the processing energy consumption for unloading a task to different devices for execution comprise the time delay and the energy consumption of local calculation of the task; unloading to an MEC server through a wireless network, wherein the calculation time delay and the energy consumption of the MEC on the server are calculated; and time delay and energy consumption of unloading to the cloud server;

task local calculation time T _i ^local And energy consumptionThe method comprises the following steps of: />

Wherein,respectively represent task offloading to local and edge clothing Ratio of server to cloud server, f _i ^local Representing the magnitude of the computing power of the vehicle; k (k) _local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b _i For the amount of task data c _i The amount of computation required for the task unit data;

the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T _i ^up，eage Processing time delay T _i ^exe，eage And the result is returned and delayed, and the energy consumption of the task unloading to the MEC edge server is uploaded by the dataTreatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server _i ^edge And energy consumption->Expressed as:

wherein B is _i,j ,Cpt _i,j For allocation to tasksThe bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; k (k) _edge Is the inherent energy consumption coefficient of CPU of the edge server, f _j ^eage Computing power for the corresponding edge server; r is (r) _i,j For task iMaximum uplink data rate between the local vehicle and the edge server; p is p _i Signal transmitting power for the local vehicle;

when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of transfer of the MEC edge server, then the MEC edge server is uploaded to the cloud server, the transmission delay from the edge server to the cloud server is regarded as a fixed value FD, and the delay T of the task unloading to the cloud server _i ^cloud And energy consumptionExpressed as:

wherein T is _i ^up，cloud And (5) time delay for uploading the data to the cloud server.

And constructing a task unloading model, and improving the network efficiency and the service quality of the end-edge cloud architecture.

Further, the specific method for constructing the task migration model is as follows:

performing overall task migration, wherein in the migration process, the migration time delay is prolongedThe method comprises the following steps:

wherein,representing the upload delay to the nearest edge server,/-for the most recent edge server>Representative of discharging tasksTransmission delay carried to target edge server according to particle swarm algorithm>Representing the time delay spent executing.

The operation is simple, and the use is facilitated.

Further, a collection-based particle swarm algorithm is adopted to select a server when task unloading is carried out:

setting the particle position code to represent the edge server selection scheme of all the current tasks, which is recorded asIf there are m edge servers in total, x is _i J (1. Ltoreq.j.ltoreq.m) represents that the ith task selects j edge servers as unloading objects of the edge layer, and the speed of the particles represents the trend of the current task selecting other edge servers for task unloading;

the method comprises the following specific steps:

s1, initializing a particle swarm, wherein the particle swarm comprises a particle swarm scale, a speed and a position of each particle, and simultaneously initializing an individual optimal position and a global optimal position of each particle;

S2, triggering an iteration every time a request comes, combining the current task attribute and available resources of each edge server, and according to the fitness functionEvaluating the fitness of each particle, wherein K is the number of local vehicles, +.>Representing the maximum tolerable delay of a task, < >>For migration delay _i A penalty factor is indicated and is indicated,g is a timeout penalty coefficient;

s3, updating the individual optimal position of each particle: if the particle current position fitness value is better than the particle historical optimal position fitness value, setting the current position as a new individual optimal position;

s4, updating the global optimal position: in the individual optimal positions of all particles, selecting the position with the optimal fitness value as the global optimal position, simultaneously taking the group optimal position after the iterative updating of the round as the solution of the request, analyzing the group optimal position code to know the serial number of the edge server selected by each task in the current request, and realizing the edge server selection of the particle swarm algorithm based on the set;

s5, according toAnd->Updating particle velocity and position, wherein +.>Representing the position of the particle i at the t-th iteration; />And->The speeds of the particles i at t+1 iterations and t iterations respectively; / >Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; cp is an individual learning factor, represents the local optimal propulsion force of particles to individual cognition, cg is a group learning factor, and represents the global optimal propulsion force of particles to group cognition; random, random is a random number with a value between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge to the global optimal position, and the current position of the particles is determined by the last positionTogether with the velocity, the particle location update procedure is expressed as:

x _i ^t+1 ＝x _i ^t +v _i ^t ；

s6, judging whether a new request is generated, if no request exists, ending the algorithm, otherwise, returning to the step S2.

The method has the advantages that the aggregate-based particle swarm algorithm is used for selecting the unloaded target edge servers, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the edge servers providing the service is optimized, and the requirements of completing the computationally intensive task for low time delay and relieving the excessive load of computational resources are met.

Further, the method for establishing the objective function comprises the following steps:

Total time delay of individual tasksAnd total energy consumption->The method comprises the following steps:

wherein,time and energy consumption for local execution; />Time delay and energy consumption for offloading tasks to an edge server; />Time delay and energy consumption for offloading tasks to a cloud server;

taking the trade-off of time delay and energy consumption as an optimization target, and defining tasksIs->The method comprises the following steps:

wherein,and->The time delay weight and the energy consumption weight are respectively expressed, and can be dynamically adjusted according to the requirements of different tasks on time delay and energy consumption;

the optimization objective is to minimize the overhead of tasks generated during the time of K vehicles T, expressed as:

s.t C1:

C2:

C3:

C4:

C5:

C6:

wherein,respectively are provided withB represents the ratio of task offloading to local, edge and cloud servers _i,j ,Cpt _i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; similarly, constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area, not exceeding its own maximum communication resource; constraint C6 represents the range of values for the delay and the energy consumption weight.

And an objective function is established, multiple constraints are set, and subsequent task unloading and resource allocation solving are facilitated.

Further, the method of building the state space is as follows:

state space s _t The method comprises the following steps:

s _t ＝{Task(t),F ^edge (t),F ^local (t),R(t)}

wherein,representing the computing resources available to all edge servers at the current time, f _j ^eage Computing power for the corresponding edge server; />Indicating the computing resources available to all local vehicles at the current moment,/->Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix _i,j Representing the ith local vehicleMaximum data uplink transmission rate with the jth edge server, if the vehicle is not within the coverage area of the jth base station, r _i,j ＝0。

The global observation of the decision maker server forms a state space, the state space contains a task information set at the current moment, communication resources available to each edge server, computing resources of all local vehicles and the maximum data uploading rate between all local vehicles and the edge servers are realized, and long-term system overhead is minimized.

Further, the method for constructing the action space comprises the following steps:

each action a in the action space _t The definition is as follows:

wherein,an offload proportion decision representing the ith task, < +. >And->Communication resource allocation decisions and computing resource allocation decisions respectively representing the ith edge server;

related actions for operating the edge server without residual communication resources and computing resources are removed from the action space, so that an effective action space is obtained;

after introducing an action space, when selecting an action by using an epsilon-greedy strategy, judging whether the action belongs to an effective action space or not, and if the action is not in the effective action space, re-selecting; meanwhile, in each time interval, the effective action space is updated in combination with the resource condition of the edge server.

After the current state selects to execute a certain action in the action space, the current state is converted into a new state in the state space, and the environment gives rewards to the agent according to the established function so as to guide the selection of the follow-up action.

Further, the method of constructing and solving the bonus function is as follows:

the optimization objective is to minimize the time delay and energy consumption of the task, minimize the system overhead and optimize the success rate of the task, thus rewarding the function R _t The definition is as follows:

wherein,a task representing an ith vehicle at a t-th time slot; k is the number of local vehicles;for tasks->Is a system overhead of (1);

And solving a task unloading decision and a resource allocation process under a multi-constraint condition by adopting a PERD DEQN algorithm:

random initialization of current Q network parameters θ _t Initializing parameter θ 'of target network Q' _t ＝θ _t Initializing an effective action space and initializing an exploration rate epsilon;

starting from epoode=1, to training round M, first initializing the current state;

starting from t=1, the number of time steps T to each round, starts to execute:

selecting actions according to the current state by using an epsilon-greedy strategy, and when the selected actions are not in the effective action space, reselecting the actions until the actions in the effective action space are selected;

performing an action, observing the prize r and the next state s';

saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting initial priorityw _i And updating the current state s=s';

starting from j=0, to miniband K, execution starts:

(1) Sampling is carried out from a priority experience playback pool according to the set weight, and samples with large weights are preferentially selected;

(2) Action a of obtaining maximum Q value based on current network ^* Calculating the Q value Q '(s', a) of the target network ^* ；θ _t ) Setting a target value Y _i ^PERDDQN ＝r _i +γQ′(s′,a ^* ；θ _t ) Wherein r is _i For rewards, γ is a rewards discount factor;

(3) Calculating the difference delta between the target Q value and the current Q network estimated value _i ＝Y _i ^PERDDQN -Q(s,a；θ _t ) The larger the TD error is, the larger the counter-propagation effect is, and the faster the training speed of network parameters can be;

(4) Setting sampling probability and priority weight of empirical data according to TD errorAnd updating the sample priority, wherein P _i Is sampling probability, beta is sampling weight coefficient, w _i Is the priority weight, delta _i Is a TD error;

after adding priority to the experience pool, the loss function is adjusted as follows:where M is the number of edge servers;

calculating gradientsAnd updates the current Q network parameters in a gradient descent method, wherein,is Q value to theta _t Is a derivative of (2);

and updating the effective action space according to the computing resource and the communication resource residual condition of each edge server in the environment and periodically updating the target network parameters according to the frequency F.

The optimization problem aims at minimizing the long-term overhead of the system, and the final goal of the deep reinforcement learning is to maximize the long-term expected rewards, so that the magnitude of the rewards is set in negative correlation with the system overhead.

The invention also provides a terminal edge cloud system architecture, which comprises a local layer, an edge layer and a cloud layer which are sequentially in communication connection;

The local layer comprises a plurality of vehicles, and the vehicles carry dual-mode communication modules;

the edge layer comprises a base station, a road side unit and an edge processor; the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server;

the cloud layer comprises a cloud server, and the cloud server executes the task unloading and resource allocation optimizing method and optimizes task unloading and resource allocation.

By utilizing the framework, task unloading and resource allocation optimization are realized, the algorithm learning efficiency is high, and the convergence speed is high.

Drawings

FIG. 1 is a flow diagram of a task offloading and resource allocation optimization method of the present invention;

fig. 2 is a schematic structural diagram of a terminal cloud system architecture according to the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.

The invention discloses a method for establishing a terminal edge cloud system architecture, which comprises the following steps:

deploying a Road Side Unit (RSU) and base stations (eNB) along a single straight channel, spacing adjacent base stations by L meters, and configuring an edge server, wherein the signal coverage radius of each base station is L/2;

The vehicle and the road side unit realize data interaction and establish connection with the base station; the vehicle is provided with a PC5/UU dual-mode communication module, realizes data interaction with an RSU (Road Side Unit) through a PC5 mode, and establishes connection with an eNB through a UU port;

the system comprises K local vehicles, M edge servers (not including a decision maker server) and a cloud server, wherein each vehicle randomly generates a task in each time intervalThe method comprises the following steps:

wherein b _i The unit is bit (bit) which is the task data volume; c _i The amount of computation required for the task unit data, in cycles/bit,tolerance time delay for the task; i denotes an i-th vehicle, t denotes a t-th time slot; task types include general, computationally intensive, data intensive, and latency sensitive. The preferences of the tasks on time delay and energy consumption are different, the time delay sensitive task is more important to optimize the time delay, and timely response of the tasks is guaranteed, so that the time delay weight is higher. The data-intensive task and the computation-intensive task respectively generate a large amount of energy consumption in the data transmission and computation processes, and the two tasks have wider requirements on time delay and higher energy consumption preference compared with the two tasks.

The vehicle task offloading dividing ratio and the remaining calculation and communication resources of the MEC server affect the task offloading selection to the MEC server, and when the vehicle is divided into the portions offloaded to the MEC server, the selection of the target server for task execution needs to be globally made, and this portion affects the task offloading transmission delay, thereby increasing the total delay of offloading to the MEC server execution. The task quantity with different sizes has different task tolerance time delay, and the tasks which exceed the tolerance time delay and are not completed are defined as failed tasks, so that the task success rate is defined: the total number of tasks/tasks that are successfully completed within the task tolerance time delay is reached. The time slot t local Task set Task (t) is:

in each time interval, all vehicles send own states to a decision maker server (a single edge server is used as a decision making server), wherein the own states of the vehicles comprise information such as vehicle computing capacity, position and the like, task information randomly generated by the vehicles at the current moment and the like;

after collecting the available computing resources and communication resource conditions of all edge servers in the end-edge cloud system at the current moment and state information (the state information is the conditions of the available computing resources and communication resources of the edge servers in the system and the conditions of the generating tasks of vehicles and the local computing resources), a decision maker server selects action decisions based on the current state;

The invention also provides a task unloading and resource allocation optimizing method, which takes the weighted sum of the minimum time delay and the energy consumption as an optimizing target, combines the task unloading and the resource allocation of the end-edge cloud system, and considers the dynamic characteristics of different vehicle-mounted tasks, the conditions of communication resources, computing resource states and the like. And (3) providing a task unloading and task migration model, solving a task migration problem by adopting a set-based particle swarm optimization (SPSO), and solving a decision problem of task unloading and resource allocation by adopting a deep reinforcement learning framework PERDDQN (deep reinforcement learning per-ddqn algorithm with preferential playback) so as to meet the requirement of minimizing the long-term overhead of the system.

As shown in fig. 1, the task offloading and resource allocation optimization method includes the steps of:

constructing a state space and an action space of one of a deep reinforcement learning algorithm, a game theory, a genetic algorithm, an ant colony algorithm or Lyapunov (Lyapunov) optimization;

In a preferred scheme of the invention, the task unloading model describes the time delay and the processing energy consumption for unloading a task to different equipment to execute, and the task unloading model consists of three parts, wherein one part is the local calculation time delay and the time delay of the task, the other part is the calculation time delay and the energy consumption on a server through a wireless network and the other part is the calculation time delay and the energy consumption which are unloaded to a cloud server. The method for constructing the task unloading model comprises the following steps:

Task local calculation time T _i ^local And energy consumptionThe method comprises the following steps of:

wherein,representing the proportion of task offloading to local, edge server and cloud server, respectively, +.>Representing the magnitude of the computing power of the vehicle; k (k) _local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b _i For the amount of task data c _i The amount of computation required for the task unit data;

in the end-edge cloud architecture, the local vehicle communicates wirelessly with an edge server. Irrespective of the communication interference between the local vehicle and the edge server, a maximum upstream data rate r between the local vehicle and the edge server for task i is generated, which is available according to shannon's formula _i,j The method comprises the following steps:

wherein W represents the channel bandwidth of the base station, p _i Signal transmission power g for local vehicle _i Sigma for channel gain of local vehicle and edge server ² Is a gaussian white noise power in a wireless communication environment;

the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T _i ^up，eage Processing time delay T _i ^exe,edge And a result return delay. Since the amount of task result data is much smaller than the amount of data uploaded, the result return delay to return the calculation result from the edge server to the local vehicle is negligible. Similar to the computation delay, the energy consumption of the task offloading to the MEC edge server is the energy consumption of the data uploading Treatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server _i ^edge And energy consumption->Expressed as:

wherein B is _i,j ,Cpt _i,j For allocation to tasksRespectively representing the bandwidth proportion and the computing resource proportion of the j-th edge server allocated to the task iCommunication and computing resource duty cycles; k (k) _edge Is the inherent energy consumption coefficient of CPU of the edge server, f _j ^edge Computing power for the corresponding edge server;

when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of the transfer of the MEC edge server, and then uploaded to the cloud server by the MEC edge server, the physical position of the cloud server is far away and has massive computing resources, the transmission Delay from the edge server to the cloud server is regarded as a Fixed value (FD), the executed Delay and the energy consumption generated by the wired communication are negligible, and the Delay of the task unloading to the cloud serverAnd energy consumption->Expressed as:

wherein,and (5) time delay for uploading the data to the cloud server.

In a preferred scheme of the invention, the specific method for constructing the task migration model is as follows:

after the vehicle offloads the task to the MEC server beside the BS, the BS decides whether to migrate the computing task to the MEC server beside other BSs (base station), and because the situation that the MEC server currently accessed by the vehicle is insufficient in remaining communication resources and computing resources, the problem of which server is the object of offloading needs to be solved for each task. Since the task offloaded from the vehicle to the MEC server is already part of the task, it is in progress When the row task is migrated, the task is not split any more, but the whole task is migrated, and in the migration process, the migration time delay is prolongedThe method comprises the following steps:

wherein,representing the upload delay to the nearest edge server,/-for the most recent edge server>Transmission delay representing offloading of tasks to target edge server according to particle swarm algorithm, +.>Representing the time delay spent executing.

In a preferred scheme of the invention, a collection-based particle swarm optimization (SPSO) is adopted for selecting a server during task unloading:

the method comprises the following specific steps:

s2, triggering an iteration every time a request comes, combining the current task attribute and available resources of each edge server, and according to the adaptability Function ofEvaluating the fitness of each particle, wherein K is the number of local vehicles, +.>Representing the maximum tolerable delay of a task, < >>For migration delay _i A penalty factor is indicated and is indicated,g is a timeout penalty coefficient;

s5, according toAnd->Updating particle velocity and position, wherein +.>Representing the position of the particle i at the t-th iteration; />And->The speeds of the particles i at t+1 iterations and t iterations respectively; />Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; cp is an individual learning factor, represents the local optimal propulsion force of particles to individual cognition, cg is a group learning factor, and represents the global optimal propulsion force of particles to group cognition; random, random is a random number with a value between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge on the global optimal position, the current position of the particles is determined by the position and the speed of the last time, and the particle position updating process is expressed as follows:

x _i ^t+1 ＝x _i ^t +v _i ^t ；

In a preferred embodiment of the present invention, the method for establishing the objective function includes:

total time delay of individual tasksAnd total energy consumption->The method comprises the following steps: />

Wherein,time and energy consumption for local execution; />Time delay and energy consumption for offloading tasks to an edge server; />Time delay and energy consumption for offloading tasks to a cloud server; the total delay includes the maximum of the calculation delay of the local execution of the task, the uploading delay of the task to the MEC (edge) server and the calculation delay (if the SPSO algorithm determines that the target MEC server executing the task is not the MEC server currently located, and the transmission delay between the MEC servers needs to be added), the uploading delay of the task to the cloud server and the calculation delay. The total energy consumption includes a sum of the calculated energy consumption and the transmission energy consumption.

s.t C1:

C2:

C3:

C4:

C5:

C6:

Wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, B _i,j ,Cpt _i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; similarly, constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area, not exceeding its own maximum communication resource; constraint C6 represents the range of values for the delay and the energy consumption weight.

To minimize long-term overhead, the offloading proportions of local vehicle tasks and the MEC server communication resources and computing resource allocations are comprehensively decided, so that the proposed vehicle part computing task offloading problem is solved using a deep reinforcement learning algorithm. First, the problem is defined as a Markov Decision Process (MDP), and then solved by using a PERD DEQN algorithm to explore a more efficient and stable task offloading strategy, so as to cope with more complex and changeable traffic scenes.

An edge server is used as a decision maker and is used as an agent (agent) in DRL (deep reinforcement learning), so that a state space, an action space and a rewarding function in reinforcement learning are defined.

In a preferred embodiment of the present invention, the state space contains a set of task information at the current time, the available communication resources of each MEC server, the computing resources of all local vehicles and the maximum data upload rate between all local vehicles and the MEC server. The task information also includes data volume, computation volume, and maximum tolerable delay. The action space contains the task offload ratio, the communication resources and the computing resource allocation of the ith MEC server. The method for constructing the state space of the deep reinforcement learning algorithm is as follows:

to minimize long-term overhead, the state space contains factors that affect task offloading decisions, mainly task own attributes and system available resources. State space s _t The method comprises the following steps:

s _t ＝{Task(t),F ^edge (t),F ^local (t),R(t)}

wherein,indicating the computing resources available to all edge servers at the current moment,/->Computing power for the corresponding edge server; />Indicating the computing resources available to all local vehicles at the current moment,/->Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix _i,j Represents the ithMaximum data uplink transmission rate between local vehicle and jth edge server, if vehicle is not in coverage of jth base station, r _i,j ＝0。

In a preferred scheme of the invention, the method for constructing the action space of the deep reinforcement learning algorithm comprises the following steps:

each action a in the action space _t The definition is as follows:

wherein,an offload proportion decision representing the ith task, < +.>And->Communication resource allocation decisions and computing resource allocation decisions respectively representing the ith edge server;

the solution to the action space is the solution to the next unloading decision and the communication and computing resource allocation decision, and the sum of the communication resources allocated to all tasks by the edge server cannot exceed the available communication resources thereof, and the sum of the computing resources allocated to all tasks cannot exceed the available computing resources thereof due to the limited communication resources and computing resources of the edge server. Related actions for operating the edge server without residual communication resources and computing resources are removed from the action space, so that an effective action space is obtained;

In a preferred scheme of the invention, in order to meet the requirement of improving the success rate of the task, namely, under the condition that the total time delay of the task is smaller than the maximum time delay tolerated by the task, the long-term overhead of the system is optimized through decision of task unloading and resource allocation, and the system overhead is a weighted sum of the total time delay and the total energy consumption. The optimization problem aims at minimizing the long-term overhead of the system, and the final goal of the deep reinforcement learning is to maximize the long-term expected rewards, so that the magnitude of the rewards is set in negative correlation with the system overhead.

Problem optimization aims at minimizing the time delay and energy consumption of tasks, minimizing the system overhead and optimizing the success rate of tasks. The method for constructing and solving the reward function is as follows:

randomly initializing Q network parameters θ in a current DQN (deep Q network) _t (θ _t Is a parameter of the network in deep reinforcement learning, which is adjusted by gradient descent), and the parameter θ ' of the target network Q ' is initialized ' _t ＝θ _t (based on the DDQN (double DQN) model, a double-layer Q network, so the second parameter is that of the second Q network), initializing the effective action space, and initializing the exploration rate epsilon;

starting from epoode=1 (epoode is a term of machine learning, M rounds of training, i.e., M epoode at a time), to training round M, the current state is initialized first;

starting from t=1, the number of time steps T to each round, starts to execute:

selecting actions according to the current state by using an epsilon-greedy strategy, and when the selected actions are not in the effective action space, reselecting the actions until the actions in the effective action space are selected; epsilon-greedy strategy is a strategy for reinforcement learning exploration based on current state and actions, exploration: meaning that the agent selects other unknown actions beyond the known (state-action) binary composition. The epsilon-greedy strategy is a common strategy, which indicates that when an agent makes a decision, a small positive probability is given to randomly select an unknown action, and the rest probability is given to select the action with the highest action value in the past.

Performing an action, observing the prize r and the next state s';

saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting an initial priority w _i And updating the current state s=s'; s, a, r, s' are respectively states, actions, rewards and the next time state;

starting from j=0, to miniband K (miniband is a machine learning term where the batch of each epoode sample is K), execution begins:

(3) Calculating a difference (TD error) delta between the target Q value and the current Q network estimate _i ＝Y _i ^PERDDQN -Q(s,a；θ _t ) The larger the TD error is, the larger the counter-propagation effect is, and the faster the training speed of network parameters can be;

after adding priority to the experience pool, the loss function is adjusted as follows:where M is the number of edge servers; />

According to the invention, optimization research is carried out on task unloading and resource allocation strategies under a terminal edge cloud architecture, and the task information of the vehicle and the information of the MEC edge server cluster are collected; these elements are used to construct a state space, a target MEC server is selected according to the SPSO algorithm, noise is added to search a neural network, and an action in an effective action space is determined. And compared with the DRL algorithms such as DDPG, DQN and the like, the PERDDQN algorithm has the advantages of double-Q-value network and delayed updating, so that the network training achieves the effects of faster convergence, higher success rate and more accuracy.

In the PERDDQN algorithm, a dual-neural network mechanism is used for decoupling the selection and evaluation of actions, a relatively smaller value is selected as a network updating target, the influence of overestimation of the target Q value on the estimation of the target Q value in the current Q network is avoided, and the problem of overestimation of the Q value is effectively solved. Meanwhile, the PERDDQN algorithm introduces a concept of priority to samples in the experience pool, and sets priority for each sample according to the absolute value of the TD error of the sample. The concept of effective action space is introduced, nonsensical action selection is reduced, and the learning of an algorithm on important samples is enhanced, so that the algorithm convergence is promoted.

According to the technical scheme, the calculation models of time delays and energy consumption of different layers of tasks under the end-edge cloud architecture are defined, meanwhile, the situation that the currently accessed MEC server of the vehicle cannot be served is considered, the SPSO algorithm is used for selecting the unloaded target MEC server, the completion rate of the vehicle task within the task tolerance time delay is guaranteed, the task migration algorithm among the MEC servers providing the service is optimized, and the requirements of completing the low time delay of the computationally intensive task and relieving the excessive load of the computing resources are met.

The invention also provides a terminal edge cloud system architecture, which is shown in figure 2 and comprises a local layer, an edge layer and a cloud layer which are sequentially in communication connection; the local layer comprises a plurality of vehicles, and the vehicles carry dual-mode communication modules; the edge layer comprises a base station, a road side unit and an edge processor; the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server; the cloud layer comprises a cloud server, and the cloud server executes the task unloading and resource allocation optimizing method and optimizes task unloading and resource allocation. By utilizing the framework, task unloading and resource allocation optimization are realized, the algorithm learning efficiency is high, and the convergence speed is high.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method of establishing a terminal edge cloud system architecture, comprising the steps of:

the time slot t local Task set Task (t) is:

2. A method for task offloading and resource allocation optimization, comprising the steps of:

the method of claim 1, wherein a terminal cloud system architecture is established, and in each time interval, all vehicles send task information randomly generated by the vehicles at the current moment and the state of the vehicles to a decision maker server, wherein the types of the tasks are common, computationally intensive, data intensive or time delay sensitive;

constructing a state space and an action space;

3. The task offloading and resource allocation optimization method of claim 2, wherein the method of constructing a task offloading model is:

Wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, f _i ^local Representing the magnitude of the computing power of the vehicle; k (k) _local The effective capacitance switching factor based on the chip architecture is the inherent energy consumption coefficient of the CPU of the local vehicle; b _i For the amount of task data c _i The amount of computation required for the task unit data;

the latency calculated by offloading task i to the edge server consists of three parts: data uploading time delay T _i ^up,edge Processing time delay T _i ^exe,edge And the result is returned and delayed, and the energy consumption of the task unloading to the MEC edge server is uploaded by the dataTreatment energy consumption->And return energy consumption composition, task->Time delay T offloaded onto MEC edge server _i ^edge And energy consumptionExpressed as:

wherein B is _i,j ,Cpt _i,j For allocation to tasksThe bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; k (k) _edge Is the inherent energy consumption coefficient of CPU of the edge server, f _j ^edge Computing power for the corresponding edge server; r is (r) _i,j Maximum uplink data rate between the local vehicle and the edge server for task i; p is p _i Signal transmitting power for the local vehicle;

when the task is unloaded to the cloud server for execution, the task data is firstly uploaded to the MEC edge server by means of transfer of the MEC edge server, then the MEC edge server is uploaded to the cloud server, the transmission delay from the edge server to the cloud server is regarded as a fixed value FD, and the delay of the task unloading to the cloud server And energy consumption->Expressed as:

wherein T is _i ^up,cloud And (5) time delay for uploading the data to the cloud server.

4. The task offloading and resource allocation optimization method of claim 2, wherein the specific method of constructing the task migration model is as follows:

5. The method for task offloading and resource allocation optimization of claim 2, wherein a set-based particle swarm algorithm is used for selection of a server at task offloading:

setting the particle position code to represent the edge server selection scheme of all the current tasks, which is recorded asIf there are m edge servers in total, x _i J (1. Ltoreq.j.ltoreq.m) represents that the ith task selects j edge servers as the object of unloading for the edge layer, the speed of the particles represents the current task selecting other edge servers for task unloadingTrend size of the load;

the method comprises the following specific steps:

s5, according toAnd x _i ^t ⁺¹ ＝x _i ^t +v _i ^t Updating particle velocity and position, where x _i ^t Representing the position of the particle i at the t-th iteration; v _i ^t+1 And v _i ^t The speeds of the particles i at t+1 iterations and t iterations respectively; ωv _i ^t Representing the inertial direction, namely the current speed of the particles is influenced by the last speed, wherein the larger the value of the inertial weight omega is, the stronger the global exploration capacity is; c _p For individual learning factors, representing the locally optimal propulsion of particles to individual cognition, c _g The population learning factors represent the global optimal propulsion force of particles to population cognition; rand of _p ,rand _g Random numbers which are all between 0 and 1; at each iteration, the particles move positions under the combined action of the previous movement trend, the self cognition and the group cognition and gradually converge on the global optimal position, the current position of the particles is determined by the position and the speed of the last time, and the particle position updating process is expressed as follows:

x _i ^t+1 ＝x _i ^t +v _i ^t ；

6. The task offloading and resource allocation optimization method of claim 2, wherein the method of establishing the objective function is:

wherein,time and energy consumption for local execution; t (T) _i ^edge ，/>Time delay and energy consumption for offloading tasks to an edge server; t (T) _i ^cloud ，/>Time delay and energy consumption for offloading tasks to a cloud server;

wherein,representing the ratio of task offloading to local, edge server and cloud server, respectively, B _i,j ,Cpt _i,j For allocation to tasks->The bandwidth proportion and the computing resource proportion of the task i respectively represent the communication and computing resource duty ratio allocated by the jth edge server; c1 The C2 constrains the value range of the task unloading proportion decision, which means that the tasks can be divided according to any proportion; c3 represents that the processing time delay of the task does not exceed the maximum tolerable time delay of the task, and the task is ensured to respond on time; constraint C4 indicates that the sum of the computing resources allocated by each MEC server for the plurality of tasks it serves does not exceed the maximum resources owned by the edge server itself; constraint C5 indicates that each MEC server allocates a sum of communication resources for a plurality of tasks within its coverage area that does not exceed its own maximum communication resource; constraint C6 represents the value of time delay and energy consumption weight Range.

7. The task offloading and resource allocation optimization method of claim 2, wherein the method of building the state space is as follows:

state space s _t The method comprises the following steps:

s _t ＝{Task(t)，F ^edge (t)，F ^local (t)，R(t)}

wherein,representing the computing resources available to all edge servers at the current time, f _j ^edge Computing power for the corresponding edge server;

representing the computing resources available to all local vehicles at the current time, f _i ^local Representing the magnitude of the computing power of the vehicle; r (t) is a matrix of KxM, any element R in the matrix _i,j Indicating the maximum data uplink transmission rate between the ith local vehicle and the jth edge server, if the vehicle is not within the coverage of the jth base station, r _i,j ＝0。

8. The task offloading and resource allocation optimization method of claim 2, wherein the method of building an action space is:

each action a in the action space _t The definition is as follows:

9. The task offloading and resource allocation optimization method of claim 2, wherein the method of constructing and solving the reward function is as follows:

starting from t=1, the number of time steps T to each round, starts to execute:

performing an action, observing the prize r and the next state s';

saving the experience tuple state, the action, the rewards and the next time state (s, a, r, s') obtained by executing the current action into a priority experience playback pool, and setting an initial priority w _i And updating the current state s=s';

starting from j=0, to miniband K, execution starts:

(4) Setting sampling probability and priority weight w of empirical data according to TD error _i ＝(K·P _i ) ^-β ，And update the samplePriority, where P _i Is sampling probability, beta is sampling weight coefficient, w _i Is the priority weight, delta _i Is a TD error;

10. The end-edge cloud system architecture is characterized by comprising a local layer, an edge layer and a cloud layer which are sequentially in communication connection;

the edge layer comprises a base station, a road side unit and an edge processor;

the vehicles interact with the road side unit data through the dual-mode communication module, connection is established with the building station, and in each time interval, all vehicles send the self state and task information randomly generated by the vehicles at the current moment to a decision maker server;

the cloud layer comprising a cloud server performing the method of one of claims 2-9, optimizing task offloading and resource allocation.