CN115934192B

CN115934192B - B5G/6G network-oriented internet of vehicles multi-type task cooperation unloading method

Info

Publication number: CN115934192B
Application number: CN202211581385.6A
Authority: CN
Inventors: 崔玉亚; 李鸿鹄; 强豪
Original assignee: Jiangsu Vocational College of Information Technology
Current assignee: Jiangsu Vocational College of Information Technology
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2024-03-26
Anticipated expiration: 2042-12-07
Also published as: CN115934192A

Abstract

The invention discloses a B5G/6G network-oriented multi-type task collaborative unloading method for the Internet of vehicles, and relates to the technical field of the Internet of things. The method comprises the following steps: the construction of the system model comprises the steps of establishing a network model, establishing an unloading model and establishing a delay model; the self-adaptive weight experience playback mechanism comprises a sample complexity, a sample use frequency model and a sample return value importance model; an OCTDE-ISAC-based distributed task collaboration offloading method includes an ISAC-based solution and an OCTDE-ISAC architecture. The invention researches the problem of multi-type task unloading among edge servers in a B5G/6G environment, solves the problem that the existing system model only considers single-type tasks and ignores diversity characteristics of different application programs, improves convergence rate and stability of an algorithm by improving the traditional SAC, and provides an offline centralized training distributed execution unloading framework based on an ISAC algorithm to solve the characteristic of poor dynamic high stability of IoV, and the proposed algorithm has better performance in delay aspect.

Description

B5G/6G network-oriented internet of vehicles multi-type task cooperation unloading method

Technical Field

The invention belongs to the technical field of the Internet of things, and particularly relates to a B5G/6G network-oriented multi-type task collaborative unloading method for the Internet of vehicles.

Background

With rapid development of technologies such as big data, internet of things, vehicle ad-hoc Networks (VANETs), ioV is becoming increasingly mainstream. IoV systems mainly include networked vehicles (equipped with various sensors such as radar, cameras, microcontrollers, etc.) and other infrastructure (such as roadside units) that provide technical support based on unmanned as well as mobile vehicles. Vehicles are considered as a routing node in the VANETs network and interconnect with neighboring vehicles or roadside units to form an ad hoc network. Most of the tasks in IoV are currently time-sensitive and delay-sensitive tasks (e.g., path planning and hazard warning), however, the computational power of the vehicle is limited and better performance cannot be obtained by processing the computational tasks locally only on the vehicle. Therefore, the application program with higher computational complexity is offloaded to the roadside unit (edge server) with rich resources. The B5G/6G network has obvious improvement in data transmission rate and delay compared with the B5G network, and the vehicles based on the B5G/6G network realize rapid and massive data transmission in the moving process, which means that the vehicles can unload a large amount of calculation tasks to the ES in a short time.

Despite the great advantages of MECs in dealing with ultra-dense applications, many challenges remain in facing higher transmission rate B5G/6G networks and highly dynamic IoV environments: 1) The computing resources of the ES are relatively rich compared with the vehicle local area, but the computing resources of the ES are still limited and only a limited number of computing tasks can be executed; 2) Quality of service (Quality of Service, qoS) and quality of experience (Quality of Experience, qoE) are both dynamically changing in the IoV environment; 3) The B5G/6G network is capable of providing ultra-reliable, faster communication services, with massive IoT devices constantly accessing a large number of advanced services.

The research on MEC is mainly focused on offloading, network resource optimization, delay and energy consumption optimization, etc., but many of these researches solve single ES and multi-user scenarios, which cannot meet the needs of massive IoT devices. To solve this problem, a large number of multi-user multi-ES scenarios are proposed, in which computing tasks distributed over ESs are not uniform in a real complex scenario (e.g. IoV), and particularly when the computing tasks on ESs time out, the high-load ESs offloads the computing tasks to the low-load ESs through edge-to-edge collaboration, so that delay of task execution is effectively reduced and resource utilization is improved. The existing system model only considers a single type of task, and ignores the diversity characteristics of different application programs. There is currently little research on the collaborative offloading of multi-type tasks between edges in IoV based on B5G/6G networks, and designing an efficient offloading policy for multi-type tasks to meet QoS of a dynamic edge server is very challenging due to the complexity and dynamics of IoV environments.

Disclosure of Invention

The invention aims to solve the existing problems, provides a multi-type task cooperative unloading method of the Internet of vehicles facing a B5G/6G network, and solves the technical problems in the background technology.

The invention is realized by the following technical scheme: a multi-type task cooperative unloading method of an Internet of vehicles facing a B5G/6G network comprises the following steps:

s1, constructing a system model, which comprises the steps of establishing a network model, establishing an unloading model and establishing a delay model, and is used for experimental test;

s2, an adaptive weight experience playback mechanism, which comprises a sample complexity, a sample use frequency model and a sample return value importance model, wherein the model comprises rewards and information entropy of a system of the sample and time difference errors;

s3, an OCTDE-ISAC-based distributed task cooperation unloading method comprises an ISAC-based solution and an OCTDE-ISAC architecture;

in the ISAC-based solution of S3, an ISAC algorithm is deployed on each ES m, using a parameterized soft Q functionAnd offload policy->The parameters are { θ, ζ }, samples are collected according to the self-adaptive weight empirical playback mechanism proposed above and approximated using a deep neural network, the parameters θ of the soft Q function are trained by minimizing the soft bellman residual:

usingTo calculate the gradient:

similar to the parameter θ for solving the soft Q function, the parameter ζ trains as follows:

wherein for the followingIs approximated by empirical playback of samples in memory area D by +.>To generate samples, cannot achieve counter-propagation of ζ, and to use re-parameterization techniques to override the offloading actions:

wherein f _ζ Is a fixed distribution function of the distribution of the light,is a noise vector, according to equation (17) will be +.>Integrating and converting into p->Integration (I) of->The solution is as follows:

in the OCTDE-ISAC architecture of S3, the jth sample in the empirical playback storage area of ES m is denoted as

Associating states and behaviors of multiple edge agents, where ES m learns its own Q function based on the global information in the empirical playback storage area;

the OCTDE-ISAC can find an optimal unloading strategy under the condition that the number of computing tasks is increased, and the computing tasks on the high-load ES are unloaded to the low-load ES to be executed, so that load balancing among the ES is achieved.

Preferably, in the step S1, the establishing a network model includes the following steps:

m Edge Servers (ES) M= {1,2,3,.. The M }, M E M are arranged in the B5G/6G network, and each vehicle is provided with an NIB supporting the B5G/6G;

using CPU frequency f _m (GHz) represents the computing power of ES m, which is represented as a set f= { F ₁ ,f ₂ ,...,f _M The set of computing task types generated by the vehicle at time T is denoted as t= {1,2, T, class n (n e T) task is

Wherein,is the size of the calculation task, measured by the CPU cycle that needs to be processed, +.>Indicating the size of the input data and,indicating delay tolerance, i.e. the maximum delay that the task can accept, < >>The value of (2) depends on the nature of the task.

Preferably, in the step S1, the method for establishing the unloading model is as follows:

the task unloaded to ES m is further unloaded to other edge servers with stronger computing power and smaller load;

using the number of tasks on the edge server to represent its load, the load of ES m at time t is represented as

Wherein,representing the load of the nth task at time t on ES m, assuming that the process of vehicle generation tasks follows a Poisson distribution, assuming that each calculation task is fine-grained, divided by arbitrary scale,/for each calculation task>An offloading policy representing ES m at time t, i.e. ES m offloading class n tasks to ES m ₁ Ratio of execution on, m ₁ E M and M is not equal to M ₁ ，/>

Preferably, in the step S1, establishing the delay model includes calculating a delay and a communication delay;

1) Calculating the delay: the edge server causes a loss of performance when performing tasks, thus introducing a loss factor for ES mAt time t, the computing power of ES m is:

queuing delay of class n task in ES m at time t according to queuing theoryThe method comprises the following steps:

at time t, calculating delay of nth class calculation task in ES mThe method comprises the following steps:

in the present model, the computation delays include the delay of queuing and the delay of processing tasks, and for the nth class of tasks, the computation delay at ES m at time t is

Total computation delay for all tasks generated on ES mThe delays involved in executing locally and off-loading to other edge servers are:

wherein χ is ₁ Representing execution locally at ES m χ ₂ Representing offloading of tasks on ES m to other edge servers for execution, where m+.m ₁ ；

2) Communication delay: in the communication model, data transmission between edge server nodes is transmitted through a wireless access network, ES m and ES m ₁ The transmission rate between them is expressed asWherein m, m ₁ E M and->The transmission of tasks between two edge servers is also a queuing process, which generates additional queuing delay, the queuing process is modeled as an M/M/1 queuing model, and the n-th class task at t moment transmits M from ES M ₁ Queuing delay +.>The method comprises the following steps:

wherein,represents ES m and m ₁ Communication costs between;

the tasks performed locally by ES m have no communication delays, and the communication delays offloaded to other edge servers include queuing delays and transmission delays:

preferably, in the step of complexity of the samples in S2, the SAC randomly extracts samples from the playback buffer, ignores differences between samples to reduce sampling efficiency of the samples, assigns different priority weights to samples in the empirical storage area, and assigns complexity SC of sample j _j Mainly includes the frequency of use SF (usage) _j ) Sample return value importance function IR (MR _j ,BE _j )；

SC(j)＝IR(MR _j ,BE _j )+ωSF(usage _j ) (12)；

Where ω is the hyper-parameter, sampling probability SP of the sample _j The rewriting is:

to prevent the overfitting phenomenon, an exponential random factor is addedWhen->Time-dependent priority sampling->Represents uniform sampling and uses an importance sampling weight parameter ψ _j To correct the distribution error generated by sampling directly in playback storage;

where β represents the compensation coefficient and D represents an experience playback buffer for storing the historical experience.

Preferably, in the step of using the frequency model for the sample of S2, in order to avoid the over-fitting phenomenon of SAC, a frequency function SF (use _j ) When the frequency of the selected sample is low, the probability of the next selected sample is lower, and conversely, the probability of the next selected sample is higher;

wherein p and q represent constants greater than 0, use _j Indicating the frequency of use of sample j.

Preferably, in the sample return value importance model of S2, the sample return value importance function IR (MR _j ,BE _j ) Mainly comprises two parameters: multi-modal rewards (System rewards and information entropy) MR of sample j _j And TD error BE _j It is expressed as:

IR(MR _j ,BE _j )＝|BE _j |*RW(MR _j )+κ (16)；

wherein, κ represents a small positive number, and κ is prevented when BE is used _j When=0, the sampling is impossible, BE _j Expressed as:

wherein,and pi _ζ (A _j |X _j ) The parameterized soft Q function and offloading policy for the jth sample are represented separately,is a target soft Q function, X _j ,A _j ,R _j The sub-table represents the state of sample j, the offload strategy, the reward, σ is the weight of the temperature parameter, i.e., entropy, γε [0,1]For discounting factor, RW (MR _j ) Is a weight of a multimodal bonus and RW (MR _j ) > 0, MR for stability of algorithm _j ∈[-1,1]；

Wherein MR is _j ＝R _j +σH(π(A _j |X _j ))。

The beneficial effects of the invention are as follows:

the invention mainly designs a multi-type task collaborative unloading method of the Internet of vehicles for a B5G/6G network, in the method, the problem of multi-type task unloading among edge servers in the B5G/6G environment is studied, the problem that the existing system model only considers single type tasks and ignores diversity characteristics of different application programs is solved, the convergence rate and stability of an algorithm are improved by improving the traditional SAC (Soft Critic Actor), an offline centralized training distributed execution unloading framework is provided based on an ISAC algorithm to solve the problem that the stability is poor in IoV dynamic performance, and the proposed algorithm is better in delay aspect.

Drawings

FIG. 1 is a system model diagram of the present invention;

FIG. 2 is an algorithm convergence diagram of the present invention;

FIG. 3 is a histogram of learning efficiency of the present invention;

FIG. 4 is a graph of the computing power of different ES versus average delay for the present invention;

FIG. 5 is a graph of different computational task sizes versus average delay for the present invention;

FIG. 6 is a graph of the present invention for different numbers of vehicles versus average delay;

FIG. 7 is a graph of the number of different task types versus average delay of the present invention;

FIG. 8 is a flow chart of the Internet of vehicles multi-type task collaboration offloading method for the B5G/6G network.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples:

referring to fig. 8, a method for cooperatively unloading multi-type tasks of internet of vehicles facing to a B5G/6G network includes the following steps:

step based on the solution of ISAC, an ISAC algorithm is deployed on each ES m, using a parameterized soft Q functionAnd offload policy->The parameters are { θ, ζ }, according to the self-adaptive weight experienceThe replay mechanism takes samples and approximates using a deep neural network, training the parameters θ of the soft Q function by minimizing the soft bellman residual:

usingTo calculate the gradient:

in the OCTDE-ISAC architecture, the jth sample in the empirical playback storage area of ES m is denoted as

the OCTDE-ISAC can find an optimal unloading strategy under the condition that the number of computing tasks is increased, and the computing tasks on the high-load ES are unloaded to the low-load ES to be executed, so that load balancing among the ES is realized;

step building a network model, namely M Edge Servers (ES) M= {1,2,3, & gt, M }, M E M are arranged in a B5G/6G network, each vehicle is provided with an NIB supporting B5G/6G, in order to ensure service quality (QoS) and user data privacy, when partial ES nodes are overloaded or the speed is too slow, computing tasks are further offloaded to other ES nodes with lower loads instead of a cloud center, each ES M, M E M provides computing offloading services for vehicles in the coverage range of the ES, and the vehicles offload tasks to the ES by using the services to improve the task processing performance and use CPU frequency f _m (GHz) to represent the computing power of ES m, thus all EThe computing power of S is represented as a set f= { F ₁ ,f ₂ ,...,f _M In one slot, each vehicle can only connect to one edge server;

for a vehicle, it runs multiple application programs (such as audio-visual entertainment, path planning, speech recognition, etc.), each application can generate different types of tasks, the set of computing task types generated by the vehicle at time T is denoted as t= {1, 2..Wherein (1)>Is the size of the calculation task, measured in CPU cycles that need to be processed,/i>Representing the size of the input data, +.>Indicating delay tolerance, i.e. the maximum delay that the task can accept, < >>The value of (2) depends on the characteristics of the task (task amount, response delay, computational complexity, etc.), and is obtained by analyzing and sampling the execution condition of the specific task.

The method for establishing the unloading model includes the steps that tasks unloaded to the ES m are further unloaded to other edge servers with stronger computing capacity and smaller load, the task processing capacity is improved, load balancing among the edge servers can be realized, the load of the edge servers can be increased along with the increase of the task scale, and the maximum computing resource of the ES m is expressed asThe load of the edge server is represented by the number of tasks on it, and the load of ES m at time t is represented by +.>Wherein->Representing the load of the nth task at time t on ES m, assuming that the process of vehicle generation tasks follows a Poisson distribution, in the present invention, assuming that each calculation task is fine-grained, divided by arbitrary scale, < >>An offloading policy representing ES m at time t, i.e. ES m offloading class n tasks to ES m ₁ Ratio of execution on, m ₁ E M and M is not equal to M ₁ ，/>

The method comprises the steps of establishing a calculation delay model, wherein the calculation delay model comprises calculation delay and communication delay, and the delay is an important factor influencing a vehicle task unloading decision;

2) Communication delay: in the communication model, data transmission between edge server nodes is transmitted through a wireless access network, ES m and ES m ₁ The transmission rate therebetween is denoted as r _m,m1 Wherein m, m ₁ E M andthe transmission of tasks between two edge servers is also a queuing process, which generates additional queuing delay, the queuing process is modeled as an M/M/1 queuing model, and the n-th class task at t moment transmits M from ES M ₁ Queuing delay +.>The method comprises the following steps:

wherein,represents ES m and m ₁ Communication costs between;

in the complexity of step samples, SAC randomly extracts samples from a playback buffer, the difference between the samples is ignored, so that the sampling efficiency of the samples is reduced, for a neural network, the samples with higher complexity are difficult to understand in the early learning stage, and the samples with low complexity are unfavorable for the learning of the neural network, therefore, the invention provides an adaptive weight experience playback mechanism by improving the traditional sampling method, samples in an experience storage area are distributed with different priority weights, and the complexity SC of the samples j is improved _j Mainly includes the frequency of use SF (usage) _j ) Sample return value importance function IR (MR _j ,BE _j )，

SC(j)＝IR(MR _j ,BE _j )+ωSF(usage _j ) (12)；

Step sample using frequency model, in order to avoid SAC over fitting phenomenon, consider sample using frequency function SF (usage _j ) When the frequency with which the sample is selected is low, then the probability of the next selection is lower and vice versa,

Step sample return value importance model, sample return value importance function IR (MR _j ,BE _j ) Mainly comprises two parameters: multi-modal rewards (System rewards and information entropy) MR of sample j _j TD error BE _j It is represented as follows:

IR(MR _j ,BE _j )＝|BE _j |*RW(MR _j )+κ (16)；

Wherein MR is _j ＝R _j +σH(π(A _j |X _j ))；

Step OCTDE-ISAC architecture to solve the interaction of vehicles and environment and instability of Internet of vehicles environment, the invention proposes an ISAC-based offline centralized training distribution execution offload architecture (OCTDE-ISAC), which stores the local observations of edge agents and the states and behaviors of other agents in an experience playback buffer during an offline centralized training phase to account for the effects among multiple users, and facilitates the collaboration among edge agents to increase the number of training samples, so the jth sample in the experience playback storage area of ES m is expressed asBecause of the state and behavior of multiple edge agents are combined +.>ES m learns its own Q function based on the global information in the empirically played back storage region, and, in offlineThe training environment is fixed for each ES in the centralized training stage, so that the influence on algorithm convergence caused by dynamic change of the environment is effectively avoided, and in the unloading decision stage, an Actor only needs to observe the local state +.>The method can make optimal decisions under the condition that other agents in the environment are not known, according to the self-adaptive weight acquisition mechanism, an Actor network learns an unloading strategy through a fully-connected network, and then the unloading strategy of the Actor is evaluated through two Critic networks to improve training efficiency, and the Critic network is also set to be a fully-connected network like the Actor network.

In this example, algorithm performance is verified according to a simulation experiment, for each ES, an Actor network and a Critic network are represented by using a multi-layer perceptron model, each ES comprises an input layer and an output layer, the hidden layer is 3 full-connection layers, the number of neurons of the hidden layer is 256, 256 and 128 respectively, and the hidden layer is activated by a GLU function, and in addition, an optimizer adopted by the present invention is Adam;

1) Experiment a: convergence of the algorithm and average delay of the system at different learning rates;

2) Experiment B: increasing the computing power of the ES from 20 to 60, increasing the size of the computing task from 500kb to 900kb, increasing the number of vehicles from 60 to 140, and increasing the number of task types from 5 to 10 under the performance of 5 algorithms under different parameters;

the experimental parameters are shown in table 1,

table 1 experimental parameters

The present example will consider three performance indicators, which are:

1. the convergence of the algorithm, the speed and the stability of the algorithm convergence with the increase of training times.

2. The effect of different learning rates on the average delay of the system.

3. The average delays of the 5 algorithms in different scenes are compared under the condition of ensuring that other parameters are unchanged, wherein the average delays of the 5 algorithms in the scenes of different ES computing capacities, different computing task sizes, different vehicle numbers and different task types are compared.

The simulation experiment results of this example are as follows:

1. convergence of algorithm

FIG. 2 shows that the ISAC can obtain stable benefits in a shorter time as shown in FIG. 2, because the ISAC's Actor network adopts SPG to generate probability distribution of continuous actions in each state instead of generating a deterministic action, thus being beneficial to exploring more useful action spaces, and the ISAC adopts maximum entropy learning to obtain an optimal strategy, the maximum entropy strategy is used to make the probability distribution of continuous actions more uniform, the optimal strategy can be learned, and the ISAC considers the difference between samples, and improves the algorithm convergence efficiency and Q value estimation accuracy through an adaptive weight experience playback mechanism, thereby obtaining a better strategy.

2. Influence of different learning rates on system average delay

The effect of learning rate on OCTDE-ISAC was studied using an exponential decay/increase method, as shown in FIG. 3. As shown in fig. 3, the performance of the algorithm is affected when the learning rate is too high or too low, the average delay of the system is the lowest when the learning rate is 0.0001, the convergence rate of the algorithm is reduced due to the smaller learning rate, and on the contrary, the convergence rate of the algorithm is high when the learning rate is higher, and the stability is poor, so that the optimal unloading strategy cannot be achieved.

3. Comparing the average delay of 5 algorithms under different scenes

As shown in FIG. 4, as the computing power of the ES increases, the average delay of all algorithms is reduced because more computing tasks are offloaded to the ES for execution as the computing power of the ES increases, thereby reducing the average delay of the system, it is noted that when the computing power of the ES is between [20-50], the service power of the ES has a significant impact on the average delay, and when the computing power of the ES is [50-60], each ES has sufficient computing resources to handle computing tasks, and therefore the impact on average delay is also small, compared with other algorithms, the average delay of OCTDE-ISACs is the lowest, because the OCTDE-ISACs propose an improved algorithm (ISAC) on a SAC basis, by taking into account the importance of the samples to increase the sampling efficiency, the speed and stability of algorithm convergence according to an adaptive weight sampling method, and by adopting offline centralized training to reduce the interactions between vehicles, effectively solving the instability of IoV environments, and by using interactions between individual vehicles and groups to approximate the interactions between the vehicles and the environments, thus the average delay of OCTDE-ISACs can be obtained with a low average delay, and the average delay of the OCTDE-ISACs can be ignored on the most low than the average delay of the OCTDE-ISACs can be offloaded to the most optimal on the system. The centralized processing mode is adopted by the CSACO, but the central agency needs the states of all agents of the mobile phone, then each training step and combined unloading action are distributed to each ES, the centralized processing mode leads to a large amount of communication consumption, the distributed processing method is adopted by the DSACO, but the method is not suitable for a high-dynamic scene such as IoV, the algorithm is not easy to converge and unstable under the condition that the vehicle moves at a high speed, the random unloading strategy is adopted by the RO without any optimization of the algorithm, and therefore the average delay of the RO is highest.

FIG. 5 shows the effect of the size of the computational task on OCTDE-ISAC and OCTDE-SAC, RO, CSACO, DSACO, where the average delay of 5 algorithms tends to increase significantly as the size of the computational task increases, because the transmission delay increases as the input data increases, and where the average delay of OCTDE-ISAC at 900KB is about 0.66s less than that of OCTDE-SAC, CSACO, DSACO by 20%, 48%, 44% as the size of the computational task increases. The OCTDE-ISAC can find an optimal unloading strategy under the condition that the number of computing tasks is increased, so that the computing tasks on the high-load ES are unloaded to the low-load ES to be executed, and load balancing among the ES is achieved.

Fig. 6 shows the effect of vehicle number on the average delay of 5 algorithms, and from fig. 6 it is seen that the average delay of 5 algorithms increases as the number of vehicles increases, because more vehicles will produce more tasks to be processed, the OCTDE-ISAC algorithm grows slowest and better than the other four algorithms for the same reason as fig. 4.

Fig. 7 shows the relationship between the number of task types and the average delay, in this model, the number of task types T is set to 5-10, when the task types increase, the states of some ESs gradually change from the low-load state to the high-load state, and finally, the states approach to the full-load state, and as the number of task types increases, it is seen from fig. 7 that the average delay of all algorithms increases, but the proposed offloading algorithm increases the slowest, and the difference between the OCTDE-ISAC and other algorithms in average delay increases from 0.62s to 2.56s at the highest, so that the effect of OCTDE-ISAC load balancing is more obvious as the scale of the task increases. The method of this example design was implemented using Pytorch 1.7.0 and Python 3.6, the hardware environment was a GPU-based server with 32GB 1600MHz DDR3,2.8GHz Intel Core i9 and 2T memory. In the IoV environment, the coverage of the ESs is considered to be circular with a radius of 1KM, and there are [4-12] fixed location Edge Servers (ESs) in the proposed environment, each ES has computationally intensive and delay-sensitive tasks of vehicle offloading, and the computational capacity of each ES is limited.

Claims

1. A B5G/6G network-oriented multi-type task cooperative unloading method for the Internet of vehicles is characterized by comprising the following steps of: the method comprises the following steps:

s2, constructing an adaptive weight experience playback mechanism, wherein the method comprises the steps of calculating sample complexity, constructing a sample use frequency model and constructing a sample return value importance model;

s3, creating a distributed task cooperation unloading method based on an OCTDE-ISAC (Offline Centralized Training Distributed Executioni-ISAC) architecture, which comprises calculating a solution of an ISAC algorithm (Improved Soft Actor Critic) and constructing the OCTDE-ISAC architecture;

the solution of the ISAC algorithm is calculated in the S3, the ISAC algorithm is deployed on an Edge Server ES m (Edge Server), and the ISAC algorithm comprises a parameterized soft Q functionAnd offload policy function->The parameters are { θ, ζ }; training the parameter θ of the soft Q function by minimizing the soft bellman residual, the training loss function of the soft Q function is:

wherein,is the target soft Q function, +.>Represents the state of ES m at time t, +.>The unloading strategy of ES m at t time is represented;

usingTo calculate the gradient:

wherein,represents the rewards of ES m at time t, gamma E [0,1 ]]Sigma is a temperature parameter, which is a discount factor;

the SAC (Soft Actor Critic) algorithm can realize maximum entropy by adding the soft Q function, the convergence is better, the formula (16) is a loss function of an Actor network, and the optimal zeta is obtained by continuous iteration; approximation by sampling from experience buffer DAccording to->The generated samples cannot achieve counter-propagation of ζ, therefore re-parameterization techniques are employed to rewrite the offloading policy:

wherein the re-parameterization technique is a method of sampling from a distribution with parameters, the effect of which is to separate the uncertainty of random variables, f _ζ Is a fixed distribution function of the distribution of the light,is a noise vector, will be for the offloading policy +.>Integrating and converting into p->Integration (I) of->The solution is as follows:

2. the method for collaborative offloading of internet of vehicles for B5G/6G networks according to claim 1, wherein in S1, a Network model is built, M edge servers m= {1,2,3, & gt, M }, M e M are provided in the B5G/6G Network, and each vehicle deploys a Network box (NIB) supporting B5G/6G;

using CPU frequency f _m (GHz) represents the computing power of ES m, which is represented as a set f= { F ₁ ,f ₂ ,...,f _M The set of computing task types generated by the vehicle at time T is denoted as t= {1,2, the n, T, class n (n e T) tasks may be expressed asWherein (1)>Representing the size of the computational task, measured using the CPU cycles required for the computational task, +.>Representing the size of the input data, +.>Representing delay toleranceI.e. the maximum delay that the task can accept,/->The value of (2) depends on the nature of the task.

3. The method for collaborative offloading of internet of vehicles for B5G/6G network according to claim 2, wherein the step S1 is to build an offloading model to offload the computing task on ES m to other edge servers; the maximum computing resource of ES m may be expressed asThe load of the edge server is represented by the number of tasks on it, and the load of ES m at time t is represented by +.>Wherein->Representing the load of the nth task at the t moment on the ES m, and assuming that the process of generating the task by the vehicle follows the Poisson distribution; in the invention, each calculation task is assumed to be fine-grained and can be divided according to any proportion; />An offloading policy representing ES m at time t, i.e. ES m offloading class n tasks to ES m ₁ Ratio of execution on, m ₁ E M and M is not equal to M ₁ ，/>ES m ₁ Indicating the target edge server to which the computing task was offloaded from edge server m at time t.

4. The method for collaborative offloading of internet of vehicles for a B5G/6G network according to claim 3, wherein the establishing a delay model in S1 includes calculating delay and communication delay;

1) Calculating the delay: the edge server causes a loss of performance when performing tasks, thus introducing a loss factor for ES mAt time t, computing power of ES m ∈m>Expressed as:

queuing delay of class n task in ES m at time t according to queuing theoryExpressed as:

wherein the computation delay comprises queuing delay and processing task delay, and for the nth class of tasks, the computation delay at the moment t is ES m

Total computation of all tasks generated on ES mDelay ofThe delays involved in executing locally and off-loading to other edge servers are:

wherein χ is ₁ Representing execution locally at ES m χ ₂ Representing offloading of tasks on ES m to other edge servers for execution, where m+.m ₁ ；And->Respectively represent the nth class task at ES m ₁ Queuing time and calculating time;

wherein,represents ES m and ES m ₁ Communication costs between;

5. the method for collaborative offloading of internet of vehicles for B5G/6G networks according to claim 1, wherein the step S2 is to calculate sample complexity, and the SAC algorithm randomly extracts samples from the playback buffer, ignoring differences between samples, and thereby reducing sampling efficiency of samples; the invention distributes different priority weights for the samples in the experience storage area, and the complexity SC of the sample j _j Mainly includes the frequency of use SF (usage) _j ) Sample return value importance function IR (MR _j ,BE _j ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein MR is _j BE and reward including sample j and entropy _j TD (Temporal Difference) error representing sample j;

SC _j ＝IR(MR _j ,BE _j )+ωSF(usage _j ) (7)；

represents an exponential randomization factor, when->Time-dependent priority sampling->Representing a uniform sampling +.>Representing the sum of the complexity of all samples; and uses an importance sampling weight parameter ψ _j To correct the distribution error generated by sampling directly in playback storage;

wherein, beta represents a compensation coefficient, and D represents an experience playback buffer area for storing historical experiences;

in the step S2, a sample frequency model is constructed, and in order to avoid the SAC algorithm from generating the over-fitting phenomenon, a sample frequency function SF (user is considered _j ) When the frequency of the selected sample is low, the probability of the next selected sample is lower, and conversely, the probability of the next selected sample is higher;

6. The method for collaborative offloading of internet of vehicles for a B5G/6G network according to claim 5, wherein the sample return value importance model of S2 is expressed as:

IR(MR _j ,BE _j )＝|BE _j |*RW(MR _j )+κ (11)；

wherein κ represents a small positive number, when BE _j Kappa can prevent the occurrence of the condition of no sampling when=0; BE _j Can be expressed as follows:

wherein,and->Parameterized soft Q function and unloading strategy function of next time slot, respectively representing jth sample, +.>Is the target soft Q function of the next slot sample j,/, is->Respectively representing the status, action, rewards, < >>And->Respectively representing the state and the action of the sample j at the next moment; sigma is the weight of the temperature parameter, entropy, gamma e [0,1 ]]Is a discount factor; RW (MR) _j ) Is MR _j Weight of (1) and RW (MR) _j ) > 0, MR for stability of algorithm _j ∈[-1,1]；

Wherein MR is _j ＝R _j +σH(π(A _j |X _j ))，H(π(A _j |X _j ) A) represents the shape at sample jThe entropy of the policy pi is selected in the state.

7. The B5G/6G network-oriented multi-type task collaboration offloading method of claim 6, wherein constructing an OCTDE-ISAC architecture in S3 is: the jth sample in the empirical playback storage area of ES m is denoted asAssociating states and behaviors of multiple edge agents, whereStates and behaviors including all ESs, +.>The method comprises the steps that the states and behaviors of all the ES at the next moment are included, and the ES m learns own soft Q function according to the global information in the experience playback storage area; in addition, the training environment is fixed for each ES in the offline centralized training stage, so that the influence on algorithm convergence caused by dynamic change of the environment is effectively avoided, and in the unloading decision stage, an Actor only needs to observe the local state->Wherein->Represents the load of ES m at time t, +.>Representing the maximum available computing resources of ES m; the method can make optimal decisions under the condition that other agents in the environment are not known, according to the self-adaptive weight acquisition mechanism, an Actor network learns an unloading strategy through a fully-connected network, and then the unloading strategy of the Actor is evaluated through two Critic networks to improve training efficiency, and the Critic network is also set to be a fully-connected network like the Actor network.