CN117793665A

CN117793665A - Internet of vehicles computing task unloading method and device

Info

Publication number: CN117793665A
Application number: CN202311805132.7A
Authority: CN
Inventors: 苏圣超; 何蓓蓓
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-03-29

Abstract

The invention relates to the technical field of Internet of vehicles, in particular to an Internet of vehicles computing task unloading method and device. The method fully considers the residual available computing resources of the adjacent service vehicles, and firstly adopts an Analytic Hierarchy Process (AHP) to prioritize tasks generated by the user vehicles; secondly, with the aim of reducing the processing time delay of the task and improving the unloading success rate as optimization targets, an improved sequence-to-sequence task scheduling model combining an attention mechanism is designed; and finally, training the model through an Actor-Critic (AC) reinforcement learning algorithm to obtain an AHP-AC-based task unloading strategy optimization model and outputting an optimal task unloading and scheduling strategy. Compared with the prior art, the method has the advantages of improving the stability of the task unloading system, reducing the task processing time delay, improving the execution success rate and the like.

Description

Internet of vehicles computing task unloading method and device

Technical Field

The invention relates to the technical field of Internet of vehicles, in particular to an Internet of vehicles computing task unloading method and device.

Background

With the development of 5G technology, many new computation tasks of delay-sensitive type and computation-intensive type, such as road condition monitoring, map navigation, path planning, etc., appear in the internet of vehicles. However, the widespread use of these computing tasks has resulted in a substantial increase in the amount of data, resulting in a vehicle with limited computing resources being unable to handle such a large number of computationally intensive tasks and thus failing to meet the quality of service level required by the user. The computing unloading is a technical advantage of edge computing, and computing tasks generated by the vehicle can be unloaded to a nearby edge computing server or a nearby vehicle with computing resources for processing through the computing unloading, so that the defect of computing resources of a vehicle-mounted unit is overcome, the execution time delay of the computing tasks is reduced, and the computing demands of users are met.

The computational offload in the on-board edge may be classified into a vehicle-roadside unit-based computational offload and a vehicle-based computational offload according to the offload mode. The edge servers deployed in the vehicle network have abundant computing and storage resources, and become the main edge facility service nodes responsible for processing vehicle tasks. However, the excessive infrastructure construction cost makes the coverage and the computing resources of the RSU (road side Unit) limited, and when the number of tasks to be offloaded is excessive, some tasks cannot be handled in time. Vehicles have not only communication functions, but also certain computing and storage capabilities. The method creates a good platform for the integrated utilization of idle vehicle resources, so that the edge capacity of the vehicle-mounted network can be greatly enhanced, and the service quality of users is improved. In the face of the time delay sensitive task with smaller calculation amount and higher real-time requirement, the time delay and the bandwidth consumption of the return stroke can be saved more than the time delay sensitive task is transmitted to the adjacent vehicle and the RSU. Thus, vehicles with free computing resources can be fully utilized to handle such computing tasks, while ensuring proper encryption and security during task data transmission and processing.

Currently, the usual models and methods for achieving task offloading are: markov decision processes, heuristics, and reinforcement learning. Markov decision processes are often used to model decision models, in one dynamic system, the system state is random, decisions must be made at each epoch system, and the cost is determined by the decisions. But has the disadvantage that it is not possible to handle the case of time randomness between decision stages. Heuristic is an algorithm that solves problems faster and more efficiently than traditional methods, and increases speed by sacrificing optimality, accuracy, precision, or integrity. However, the disadvantage is that the iterative optimization takes a long time, and such problems cannot be solved efficiently. Reinforcement learning is another approach for distributed decision making that can be iteratively improved by constant interactions with the environment, gradually approaching an optimal solution. But has the disadvantage that the high variance problem faced by the strategic gradient method can lead to instability of the training process.

The key to the task offloading problem is the decision making of offloading decisions, which in turn are closely related to the selection of the vehicle offloading mode or the selection of the model to achieve task offloading. Therefore, what kind of vehicle unloading mode and the model for realizing task unloading can more reliably finish task unloading, improve the efficiency and the success rate of task unloading, and become the problem to be solved in the field.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide the method and the device for unloading the computing tasks of the Internet of vehicles, which can effectively improve the accuracy of computing resource prediction required by task unloading, reduce the failure rate of task unloading and improve the stability of the whole task unloading system.

The aim of the invention can be achieved by the following technical scheme:

according to a first aspect of the present invention, there is provided a method for offloading internet of vehicles computing tasks, comprising the steps of:

s1, initializing a task queue to be executed, a task state, a task vehicle number, a service vehicle number, available computing resources of the task vehicle and available computing resources of the service vehicle, wherein the task to be executed is generated by the task vehicle;

s2, prioritizing all tasks to be executed by using an analytic hierarchy process, and acquiring the current task state of each task by combining CPU execution time, calculation complexity, data volume and time delay tolerance;

s3, performing task scheduling by utilizing an improved sequence-to-sequence model based on a current task queue to be executed, obtaining a service vehicle number or a task vehicle number for executing each task, and obtaining a current unloading and scheduling strategy, wherein an unloading mode comprises local execution or unloading to a service vehicle for execution;

s4, acquiring a current task state and current service vehicle available calculation materials based on the unloading and scheduling strategy;

and S5, training the improved sequence-to-sequence model by using an Actor-Critic algorithm based on the task state and the available computing resources of the service vehicle to obtain an optimal task unloading and scheduling strategy.

As a preferred solution, the improved sequence-to-sequence model uses CNN-RNN combined network and adds an attention mechanism to process the output of the encoder.

As a preferred technical solution, the task scheduling process using the improved sequence-to-sequence model considers task allocation failure and timeout failure, and the specific process includes:

s301, initializing a task failure queue, a task processing queue, a task waiting queue and priorities of all tasks;

s302, determining the task with the highest priority in a current task waiting queue, and distributing a service vehicle or a task vehicle for the task with the highest priority;

s303, comparing the first computing resource required by the task with the first computing resource available in the distributed service vehicle or task vehicle:

when the initial available computing resource is smaller than the first computing resource, the task allocation fails, and the current task is added into a task failure queue; otherwise, waiting for the completion of the execution of the task with the shortest residual execution time in the current task processing queue and releasing the occupied computing resources, and immediately updating the residual execution time and the total execution time of all the tasks in the current task and the task processing queue after waiting for the completion;

s304, judging whether the current task is overtime, if yes, adding the current task into a task failure queue, otherwise returning to S302.

As a preferred technical solution, the process of determining the unloading manner includes determining whether the task needs to be unloaded and executed: if yes, the unloading decision is sent to the task vehicle and the service vehicle with the corresponding number, and the task vehicle unloads the task to the service vehicle with the corresponding number for execution; if not, the unloading decision is sent to the task vehicle, and the task vehicle leaves the task to be executed locally.

As a preferred technical solution, in the process of training the improved sequence-to-sequence model by using the Actor-Critic algorithm, the reward function used is:

in the formula, average_time represents average time delay of all tasks, q represents the number of task failures, and lambda is a preset weight factor.

As a preferable technical solution, the expression of the average time delay is:

wherein T is ₀ Representing the total time delay of all tasks performed on the task vehicle, T _j Indicating the total delay of all tasks performed on the jth service vehicle, N indicating the number of service vehicles.

As a preferred technical scheme, the total time delay of each task executed on the corresponding service vehicle is the sum of waiting time before task unloading, data uploading time, execution time of the task on the service vehicle and execution result feedback time.

As an optimal technical scheme, the weight factors are determined according to task success rate and average time delay.

As a preferable technical scheme, the expression of the task success rate is:

in the method, in the process of the invention,representing the number of task failures, M representing the total number of tasks.

According to a second aspect of the present invention, there is provided an internet of vehicles computing task offloading apparatus comprising a memory, a processor, and a program stored in the memory, the processor implementing the method when executing the program.

Compared with the prior art, the invention has the following beneficial effects:

1. in the process of task scheduling by utilizing the improved sequence to sequence model, the invention adopts an analytic hierarchy process to carry out priority division, considers the CPU execution time, the computational complexity, the data volume and the delay tolerance of each task to obtain the task state of each task, and the tasks are mutually independent and have no dependency relationship, and comprehensively judge according to all state characteristics of the tasks and the computational resource size of a service vehicle when the task scheduling is carried out to obtain an unloading and scheduling strategy, so that different requirements of the tasks can be fully met, and the stability and the user experience of the system are improved;

2. according to the improved sequence, the improved sequence is adopted in the sequence model, the encoder adopts a CNN-RNN combined network, and an attention mechanism is added to process the output of the encoder, so that the CNN can help to extract local characteristics, the RNN can be used for capturing long-term dependency in the sequence, the problems of gradient disappearance and gradient explosion are effectively solved, the model has better robustness, the accuracy of computing resource prediction required by task unloading is improved, and the task unloading efficiency is further improved;

3. in the task scheduling process, the situation of two task unloading failures, namely allocation failure and overtime failure, is considered, a reward function of an Actor-Critic algorithm is redesigned, an adjustable weight factor considering the task success rate and average time delay is introduced into the reward function, meanwhile, an Actor network (namely a sequence-to-sequence model) is improved, an Actor-Critic reinforcement learning algorithm is used for continuously carrying out iterative optimization on a task unloading strategy, an optimal network parameter is obtained, an optimal AHP-AC model is obtained, and finally the unloading and scheduling strategy output by the model can effectively reduce the failure rate of task unloading, namely the success rate is improved;

4. the unloading mode adopted by the invention comprises two modes of local execution and unloading to a service vehicle for execution, and the real-time task with small calculated amount and high time delay sensitivity is unloaded to other vehicles adjacent to the user vehicle, so that the parallel capability among a plurality of vehicles is fully utilized, the waiting time of the task is reduced, and the real-time performance of the task can be effectively improved;

5. the invention designs the task waiting queue in the task scheduling, considers the waiting time of the task in the service vehicle, is beneficial to designing a more intelligent and efficient task scheduling strategy, ensures that the system can better meet the time delay requirement of the task and provides better service quality under limited computing resources.

Drawings

FIG. 1 is a diagram of a task offloading system architecture in an embodiment of the invention;

FIG. 2 is a schematic diagram of a hierarchical model of task priorities in an embodiment of the present invention;

FIG. 3 is a diagram illustrating an example of a task offloading scheme in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a modified Seq2Seq model in an embodiment of the invention;

FIG. 5 is a schematic diagram of an initiator-Critic algorithm framework in accordance with an embodiment of the present invention;

FIG. 6 shows the average time delay for different weight factors according to the embodiment of the present invention;

FIG. 7 shows success rates for different weight factors in an embodiment of the invention;

FIG. 8 is an average time delay as the number of tasks increases in an embodiment of the present invention;

FIG. 9 shows the success rate of increasing the number of tasks in an embodiment of the present invention;

FIG. 10 shows the average delay as the CPU frequency increases in an embodiment of the invention;

FIG. 11 shows the success rate of increasing CPU frequency in an embodiment of the invention;

FIG. 12 shows the average time delay as the number of SVs increases in an embodiment of the present invention;

FIG. 13 shows the success rate when the number of SVs is increased in the embodiment of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

Examples

The embodiment provides a vehicle networking computing task unloading method based on deep reinforcement learning, which is characterized in that tasks are prioritized before task unloading, the task processing time delay is reduced, the unloading success rate is improved by analyzing heterogeneous computing resource changes of the vehicles, a time delay optimization model and an AHP-AC-based task unloading strategy optimization model are provided, and the model is trained by deep reinforcement learning.

1. Task offloading system

First, offloading system architecture

The architecture of the vehicle-to-vehicle computing and unloading system adopted in the embodiment is shown in fig. 1. There are many vehicles traveling on a unidirectional straight road, and the set v= {1,2,3, v. it is indicated that, on the road side, an RSU with a mini-server is deployed, and all vehicles on the road are always in the communication range of the RSU, and wireless communication can be carried out among the vehicles.

The vehicle generating the task is called Task Vehicle (TV) forThe vehicle around which computing resources can be provided is denoted as Service Vehicle (SV), with +.>And (3) representing. When the TV generates tasks to be offloaded, the TV firstly sends a task offloading request to the RSU, then the RSU sends task offloading decisions to the TV and the SV respectively, and then the RSU sends the tasks to the TVAnd unloading the task to be unloaded to a specific SV by the TV, and returning a result to the TV by the SV after the task is completely executed. Wherein tv= { f _l ,P _l ^s }，f ₁ Is the computing power of the TV, which refers to CPU cycles per second, +.>Is the transmit power at which the TV sends task data to the SV. Assume that there are N SVs around TV during task offloading, sv= { SV ₁ ,SV ₂ ,...,SV _N }, whereinj∈{1,2,…,N}，f _j Is the calculation power of the jth SV, also referred to as CPU cycles, P per second _j ^s Is the transmit power of the jth SV. Each vehicle travels at a constant speed v, the speeds of the different vehicles following a uniform distribution. The channel state is estimated with the distance between TV and SV and the link duration is estimated with the relative position and relative speed of TV and SV. Dividing the total system time into several time periods, generating M tasks by TV in a single time period, and defining the ith task as a triplet omega _i ＝{C _i ,D _i ,τ _i I.e {1,2, …, M }, where C _i Indicating completion of omega _i Required CPU cycles, D _i Representing omega _i Size of input data quantity τ _i Representing omega _i Is a maximum tolerable delay of (a). Assuming that all tasks are not detachable and no dependency relationship exists between the tasks, the execution modes of the tasks are two: 1) Executing locally; 2) Offloading to execution on SVs with available computing resources nearby. Each task can only select one of the execution modes. If the off-load to the SV is selected, only one SV can be selected for off-load, and the SV can process multiple tasks in parallel.

(II) task prioritization model

When a user vehicle unloads a computing task to other vehicles with idle computing resources, the emergency degree of the computing task is different due to different task types caused by different computing task attributes. The types of computing tasks generated by the vehicle are safe emergency tasks, general real-time tasks and other task types. In this embodiment, the priority is used to represent the urgency of the task, and in the case of limited computing resources, the higher the priority of the task, the more urgent the task, and the priority of the task is processed. Analytical hierarchy process (analytic hierarchy process, AHP) is a multi-standard decision method for qualitative and quantitative analysis. The method divides the composition factors related to decision into three layers: the target layer, the criterion layer and the scheme layer form a multi-layer structural model which is very suitable for processing the problems of weight distribution of task attributes and task priority division.

For the attribute of the task, the following three evaluation indexes are mainly considered in the embodiment: the calculation complexity, the task data volume and the time tolerance are sequentially as follows: computational complexity, task data size, delay tolerance. A hierarchical model of task priorities is shown in fig. 2. By the AHP method, higher weight is given to tasks with high computational complexity, large data volume and low delay tolerance, so that the system can more reasonably allocate limited computational resources under the constraint of delay, and the task unloading success rate is improved.

The AHP algorithm steps are as follows:

step 1: comparing three evaluation indexes of the criterion layer according to table 1, and constructing an evaluation index judgment matrix (pairwise comparison matrix, PCM) from the target layer to the criterion layer to be a= (a) _ij ) _3×3 Judgment matrix B of criterion layer to scheme layer ₁ ,B ₂ ,B ₃ ＝(a _ij ) _M×M . Wherein the method comprises the steps of

TABLE 1 PCM quantized values

Step 2: obtaining B by sum-product method _k Is the i-th weight of (2)

Where k represents the kth task, and i represents the ith evaluation index of the kth task. The weight matrix for all tasks is as follows

Step 3: the weight vector corresponding to A is obtained by the sum-product method to be beta= [ beta ] ¹ ,β ² ,β ² ] ^T Wherein

Step 4: through consistency test of the weights, the weight vector S of all the tasks shown in the formula (5) is finally obtained, wherein the elements in the S consist of the weights of the corresponding calculation tasks

(III) time delay model

If the TV processes task omega locally _i The calculation time delay of the task is as follows

If the TV is to task Ω _i Unloading to other vehicles SV _j The unloading process is performed as follows: TV uploads task data to SV _j ，SV _j Executing task Ω _i And transmits the result back to the TV. The total time of task offloading is divided into four parts:waiting time before task unloading, data uploading time and SV of task _j And the return time of the results. Before task unloading, the information of the task to be unloaded and the SV state are uploaded to the RSU through a wireless network, and then the RSU distributes the SV for the task _j . By R _l Representing offloading tasks to SVs _j Link transmission rate at the time, R _j Representing SV _j The results are passed back to the link transmission rate at the time of TV.

Wherein B represents the bandwidth of the transmission channel of the vehicle, d _j Representing TV and SV _j The distance between them, delta represents the path loss index, h _j Representing complex Gaussian channel coefficients, N ₀ Representing additive white gaussian noise, I represents interference generated by other vehicles during V2V transmission. Then task Ω _i Uploading data to an SV _j Is the transmission time of

Task Ω _i At SV _j The execution time is as follows

The return time of the result is

Where η represents the ratio between the size of the output data amount and the size of the input data amount, typically η.ltoreq.1, may be as followsThe task time delay is ignored when calculating.Representing task Ω _i Waiting time before unloading, task Ω _i The total time delay of unloading is

(IV) task scheduling model

Aiming at the optimization problem, three queues are designed to record the state of task scheduling in the embodiment:

(1) A failure Queue (Fail Queue) for recording allocation failures and executing overtime tasks;

(2) A Processing Queue (Processing Queue) that records the task being executed;

(3) A Waiting Queue (wait Queue) records tasks that have not yet been scheduled.

The task scheduling process is shown in algorithm 1. First, find out the task Ω with highest priority in the Waiting Queue, and schedule the task Ω. Comparing the initial computing resource of the SV or the TV allocated to the task omega with the computing resource required by the task omega, if the initial computing resource of the SV or the TV is smaller than the computing resource required by the task omega, failing to allocate the task, and adding the SV or the TV into the Fail Queue; if the task allocation is successful, judging whether the current available computing resources of the SV or the TV are enough to compute the task, and if so, adding the task into the Processing Queue of the current SV or the TV; otherwise, task Ω waits. And after the task with the shortest residual execution time in the Processing Queue of the current SV or TV is completely executed and the occupied computing resource is released, the residual execution time and the total execution time of all the tasks in the task omega and the Processing Queue are updated immediately. After the updating is finished, judging whether the task omega is overtime, if so, failing to execute the task omega, and adding the task omega into the Fail Queue; if not, the task omega is re-scheduled by recursively calling the algorithm 1.

Table 2 task scheduling model Algorithm 1 flow

The research aim of the embodiment is to improve the task execution success rate as much as possible under the condition of reducing the average time delay of task execution. Therefore, the execution success rate of the calculation task is defined as:

in the formula, q is the number of tasks with allocation failure and overtime failure, namely the number of tasks in the Fail Queue, and M is the total number of tasks. The total time delay of all tasks performed on TV is denoted as T ₀ The total time delay of all tasks performed on the jth SV is denoted as T _j Since the vehicles are calculated in parallel, the total time delay of all tasks generated by the TV is T ₀ To T _N Is the maximum value of (a). The average time delay of all tasks is:

the number of failed tasks is multiplied by a weight factor lambda and added to the reward function such that the greater the success rate and the smaller the average delay, the greater the reward obtained. The choice of the weighting factor lambda in the reward function needs to be weighted according to the importance of the task success rate and the average delay. The reward function is:

2. internet of vehicles computing task unloading method based on deep reinforcement learning

The scheduling problem of the calculation task of the Internet of vehicles is an NP-hard problem, and the traditional numerical calculation method and heuristic algorithm cannot efficiently solve the problem due to long iterative optimization time. The deep reinforcement learning method can be iteratively improved by continuous interaction with the environment, thereby gradually approaching the optimal solution. However, in reinforcement learning, the high variance problem faced by the strategy gradient method can lead to instability of the training process. And the Critic network in the Actor-Critic algorithm provides baseline estimation of the strategy, so that the variance of the strategy gradient can be reduced, and the training process is more stable. Critic networks estimate the long-term return for each state by a value function, helping the agent to better understand the environment and improve the strategy. Therefore, the present embodiment proposes a computational task offloading scheduling method based on deep reinforcement learning (deep reinforcement learning, DRL), and trains the neural network through an Actor-Critic algorithm.

Actor network design

Assuming that TV produces m=6 computing tasks in total, the number of SVs is n=3, TV is numbered 0, and SVs are numbered 1,2,3 in order. One such task offloading scheme is shown in FIG. 3: 0 means that Task1 is executed locally on TV number 0,2 means that Task2 is executed on SV number 2,1 means that Task3 is executed … on SV number 1, and the execution decision of the total 6 tasks can be expressed as {0,2,1,3,2,1}.

The sequence-to-sequence model (sequence to sequence, seq2 Seq) is one of the encoder-decoder structures whose basic structure consists of a recurrent neural network (recurrent neural network, RNN). The problem of assigning tasks to service vehicles can be translated into a problem of mapping task sequences to corresponding length service vehicle number sequences, which is suitably solved with Seq2 Seq. RNN structures commonly used in Seq2Seq have natural sequence modeling capabilities, but problems of gradient extinction or gradient explosion may be encountered when dealing with long sequences, whereas CNN (Convolutional Neural Networks) can process inputs in parallel in a local scope, and using CNN for extracting spatial features can alleviate this problem. Thus, the present embodiment replaces the RNN of the encoder part in the Seq2Seq with a CNN-RNN binding model, where CNN can help to extract local features and RNN can be used to capture long-term dependencies in the sequence. In addition, the Seq2Seq model requires compression of all information of the input sequence, so it is difficult to process a long input sequence. In this regard, the present embodiment adds an attention mechanism (attention mechanism). When the encoder gets an output, it searches for the part of the input sequence that has the highest correlation with the output. The model then predicts the unloading vehicle number for the next task based on the context vector corresponding to the portion of the input sequence and all previously generated service vehicle numbers. The improved model concentrates the attention on the part of the input sequence most relevant to the current task when decoding, and can handle long input sequences better than the original Seq2Seq model.

The improved Seq2Seq model is shown in fig. 4, the input of the encoder is a task sequence to be scheduled obtained through AHP calculation, the input of the decoder is a context vector of the last time step and hidden layer output of the last time step, the output is conditional probability of the next time step task, corresponding service vehicle numbers are obtained, and finally all task execution decision coding sequences are obtained through the Seq2 Seq.

At time step t of the decoding phase, the decoder gives the context vector c _t And all previous predictors { y } ₁ ,...,y _t In case of }, the next result y is predicted _t+1 The output result is { y } ₁ ,...,y _t+1 Probability of } is

Where each conditional probability is modeled as

Combining the concealment layer vector H of the encoder with the concealment layer output s of the last time step decoder _t Input into Attention to obtain Attention value

Wherein s is ₀ As an initial input of the t=0 time step Attention, the last hidden layer h _M The product is obtained after the linear transformation,as intermediate variable, W ₁ 、v ₁ Is a trainable parameter, attn is a fully connected neural network. From H and the attention value vector e _t Obtaining a context vector c _t 。

c _t ＝e _t H (19)

Will c _t 、s _t And the output y of the decoder at time step t _t Inputting into decoder, obtaining hidden state s of t+1 time step through tanh activation function _t+1 And conditional probabilities required by equation (17).

s _t+1 ＝tanh(W ₂ [y _t ,c _t ,s _t ]) (20)

In which W is ₂ 、v ₂ Is a trainable parameter.

(II) Actor-Critic model training

The network training of this embodiment is shown in fig. 5. State represents a State, including a mission State and a service vehicle State. The task state consists of five parts of execution time of the task in the CPU, calculation complexity, data volume, delay tolerance and task priority, and the service vehicle state is a calculation resource available for the vehicle. The Action represents an Action, namely the number of the server to be offloaded of the task generated by the Actor network, and the set of actions is the offloading policy. Reward represents the corresponding rewards for executing the action. The training process is as follows: the state is input into an Actor network, and after corresponding actions are generated, interaction is carried out with the environment through an algorithm 1, so that rewards are obtained. And taking the states and rewards after the actions are executed as inputs of the Critic network, and updating the Actor network and the Critic network through a loss function.

The parameters of the Actor are defined as θ, and the reward (a) is the reward generated by the Actor when the strategy a is adopted, and is obtained by the formula (15). The function J (θ) is defined as follows:

J(θ)＝E(reward(a)) (22)

the goal of the Actor optimization is to maximize the prize value, and the Actor network is trained by using the Reinforce algorithm in this embodiment, which is expressed as:

in the method, in the process of the invention,is a baseline function, P, derived from the Critic network _θ (a|s) is the conditional probability of taking policy a in state s, obtained by the Actor network, determined by equation (17). Assuming that the batch size in the training dataset is K, the updated formula for the parameter θ can be expressed as

Parameters of Critic using a mean square error loss functionUpdating:

in the method, in the process of the invention,prize value, reward (a) _k |s _k ) Is a true value.

The training process for Actor-Critic is shown in table 3, where learning rate is 0.0001,batch size and Adam is used for the optimizer. The Critic network structure consists of an RNN layer, a convolution layer and a fully connected network, and is used for calculating the predicted reward value of the input state.

TABLE 3Actor-Critic network training procedure

3. Simulation experiment and result analysis

Experimental parameter set-up

In this embodiment, a simulation environment is constructed based on a pyrerch deep learning framework, and the scene is set such that task vehicles and service vehicles are randomly distributed on a road section with a length of 800m, the road section is covered by an RSU signal with a communication radius of 500m, and the vehicles do not exceed the communication range of the RSU throughout. Since the maximum tolerance time delay of the task is 0.5-1s and the time is very short, the task execution is completed from the task, and the relative position between the task vehicle and the service vehicle is not changed. The specific settings of the experimental parameters are shown in table 4.

TABLE 4 parameter settings

(II) analysis of results

Fig. 6 and 7 show the comparison of the average time delay and the task execution success rate when the algorithms according to the present embodiment use different weighting factors (λ=1, λ=50, λ=100, λ=150, λ=200) at SV number 3, respectively. As can be seen from fig. 6 and 7, when the weight factor λ=100, the average delay of the task is the lowest and the success rate is the highest.

To evaluate the performance of the model, the present embodiment uses one batch of data to test the model, and compares the algorithm proposed in the present embodiment with the following three task allocation algorithms:

(1) Local only processing (only-local): each task is handled using only TV local computing resources.

(2) Greedy policy-based algorithm (greedy strategy based algorithm, GBA): for each task that needs to be offloaded, the SV with the most remaining computing resources is always selected for offloading.

(3) Random algorithm (random): for each task to be offloaded, randomly selecting an SV for offloading.

Fig. 8 and 9 show four task allocation algorithms, and when the SV number is 3, the average time delay and the average execution success rate of the tasks are compared with the change condition of the number of the tasks. As can be seen from fig. 8, as the number of tasks increases, the average time delay of task execution corresponding to the four algorithms increases gradually. This is because, with the same computing resources, the larger the number of tasks, the more tasks to wait, and the higher the latency increases, resulting in a higher overall execution delay for the tasks, so the average delay gradually increases as the number of tasks increases. As can be seen from fig. 9, since the number of tasks is increasing and the available computing resources are not changing, the number of tasks that fail over time is increasing, so the task execution success rate for the four algorithms is decreasing.

As can be further seen from fig. 8 and 9, the average delay of Only-local is far greater than that of the other three algorithms, and the execution success rate is low. The GBA selects the vehicle with the most residual computing resources each time, so that the allocation success rate is higher, but the average time delay is higher compared with that of Random and AHP-AC, and the real-time requirement of the task is difficult to meet. Compared with the other three algorithms, the AHP-AC algorithm task used in the embodiment has the advantages of lowest average time delay of task execution and highest execution success rate.

To evaluate the generalization ability of the proposed model, the present embodiment verifies the model performance in terms of both CPU frequency and SV number. Fig. 10 and 11 show four task allocation algorithms, and when the SV number is 3 and the task number is 20, the average time delay and the average execution success rate of the tasks are compared with the change condition of the CPU frequency. As can be seen from fig. 10, as the CPU frequency increases, the average delay of the four algorithms decreases, and the task execution success rate increases. When the performance of the service vehicle is enhanced, the execution time delay of the tasks is correspondingly reduced, the number of tasks which fail overtime in the whole queue is reduced, and the execution success rate is improved.

Fig. 12 and 13 show three task allocation algorithms, and the average time delay and the average execution success rate of the tasks are compared with the change condition of the number of SVs when the number of the tasks is 20 and the cpu frequency is 600 Mcycle/s. As can be seen from fig. 12 and 13, as the number of SVs increases, the average time delay of the three algorithms decreases, and the task execution success rate increases. This is because when the number of service vehicles increases, the parallelism of the service vehicles increases, the waiting time of the overall task decreases greatly, and the number of tasks successfully executed in the entire queue increases, so that the average delay decreases and the execution success rate increases.

As can be seen from fig. 8 to fig. 13, after the tasks are allocated by the AHP-AC algorithm provided in this embodiment, the task execution effect is always better than that of Only-local, GBA and Random, and the task has good generalization performance. In the aspect of algorithm, the AHP algorithm is used for carrying out priority division on tasks, and then the AC algorithm is used for training, so that the convergence rate of the AC algorithm can be greatly improved. In practical application, after the tasks are prioritized, the tasks with high emergency degree can be processed preferentially, so that the RSU is ensured to make a more reliable unloading decision.

Aiming at the problem of unloading the computing task in the vehicle edge network, the embodiment provides a time delay model and an unloading strategy optimization model. By offloading the computing task of the vehicle terminal to the proximity service vehicle, the time delay of processing the computing task by the user vehicle is further reduced, the requirement of real-time performance of the time delay sensitive task is met, and the utilization rate of computing resources of the vehicle terminal is improved. Experimental results show that the AHP-AC task unloading algorithm provided by the embodiment effectively improves the task execution success rate, achieves the aim of minimizing the calculation time delay of all tasks on the premise of meeting the task time delay constraint, and has good generalization performance. Next, the reliable unloading in case of task transmission failure will be studied with emphasis while improving the practicality of the system.

Further, the embodiment also provides a device for unloading the internet of vehicles computing task, which comprises a memory, a processor and a program stored in the memory, wherein the processor realizes the method when executing the program. The device processor includes a Central Processing Unit (CPU) that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) or computer program instructions loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device can also be stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus. A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks. The processing unit performs the various methods and processes described above, such as the methods in the foregoing embodiments. For example, in some embodiments, the methods of the foregoing embodiments may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. When the computer program is loaded into RAM and executed by a CPU, one or more steps of the method in the foregoing embodiments may be performed. Alternatively, in other embodiments, the CPU may be configured to perform the methods in the foregoing embodiments in any other suitable manner (e.g., by means of firmware). The functions described above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. The method for unloading the computing tasks of the Internet of vehicles is characterized by comprising the following steps of:

2. The method of claim 1, wherein the improved sequence-to-sequence model is used by the encoder to process the output of the encoder using a CNN-RNN combined network and adding an attention mechanism.

3. The method for offloading computing tasks on internet of vehicles according to claim 1, wherein the task scheduling process using the improved sequence-to-sequence model considers task allocation failure and timeout failure, and the specific process comprises:

4. The method for offloading a computing task of claim 1, wherein determining the offloading mode includes determining whether the task requires offloading to be performed: if yes, the unloading decision is sent to the task vehicle and the service vehicle with the corresponding number, and the task vehicle unloads the task to the service vehicle with the corresponding number for execution; if not, the unloading decision is sent to the task vehicle, and the task vehicle leaves the task to be executed locally.

5. The method for offloading internet of vehicles computing tasks according to claim 1, wherein in training the improved sequence-to-sequence model using an Actor-Critic algorithm, a reward function is used as follows:

6. The internet of vehicles computing task offloading method of claim 5, wherein the average latency is expressed as:

7. The method of claim 6, wherein the total latency of each task executing on the corresponding service vehicle is the sum of the waiting time before task unloading, the data uploading time, the execution time of the task on the service vehicle and the execution result return time.

8. The method of claim 5, wherein the weight factor is determined based on task success rate and average time delay.

9. The internet of vehicles computing task offloading method of claim 8, wherein the task success rate is expressed as:

10. An internet of vehicles computing task offloading apparatus comprising a memory, a processor, and a program stored in the memory, wherein the processor, when executing the program, implements the method of any one of claims 1-8.