CN114626298A

CN114626298A - State updating method for efficient caching and task unloading in unmanned aerial vehicle-assisted Internet of vehicles

Info

Publication number: CN114626298A
Application number: CN202210246271.XA
Authority: CN
Inventors: 秦晓琦; 胡楠; 马楠; 张治�
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-14

Abstract

The invention discloses a state updating method for efficient caching and task unloading in an unmanned aerial vehicle-assisted Internet of vehicles, which comprises the steps of firstly, taking the freshness of a cache model used for calculation and the freshness of a calculation unloading process into consideration to serve as the information freshness of state updating of a vehicle in order to achieve the aim of comprehensively ensuring the safety of automatic driving, and secondly, combining three technologies of vehicle edge calculation, caching and unmanned aerial vehicle, designing a strategy for making decisions on cache updating, user association and resource allocation in a dynamic environment to minimize the energy consumption of a system and ensure the timeliness of information. And finally, dividing the experience cache pool by using a deep reinforcement learning algorithm and adopting an algorithm based on a deep certainty strategy gradient, and selecting an experience training neural network from two different cache pools in proportion, so that the convergence speed is increased, the reward oscillation after convergence is reduced, the reward value after convergence is improved, and the algorithm performance is improved.

Description

State updating method for efficient caching and task unloading in unmanned aerial vehicle-assisted Internet of vehicles

Technical Field

The invention relates to the technical field of automatic driving, in particular to a state updating method for efficient caching and task unloading in an unmanned aerial vehicle auxiliary Internet of vehicles.

Background

With the continuous development of the internet of vehicles, various applications and services of the internet of vehicles are generated, and a large amount of data traffic is brought to the internet of vehicles, so that the contradiction between computation overload and limited frequency spectrum and computation resources in the traditional vehicle-base station type internet of vehicles architecture is increasingly prominent. The contradiction brings challenges to the traditional internet of vehicles difficult to construct in supporting time delay sensitive and resource demand diversified applications and services and dealing with sudden traffic situations. As one of the important application scenarios of the internet of vehicles, the problem is further aggravated by the fact that the automatic driving needs to frequently sense the external environment and generates a large data flow. Therefore, a great deal of research shows that mobile edge computing can be introduced into a vehicle networking architecture, a vehicle edge computing technology is used, a task computing process of a vehicle is sunk to the edge of a network, core network pressure is reduced, and the challenges can be effectively met.

Caching, a technique for storing content in a server in advance by prediction, can reduce frequent responses, and is often used to assist edge calculation to help optimize various performance indicators. Caching is also often used as an aid in designing strategies for vehicle edge calculation. By introducing the caching technology, the computing delay can be effectively reduced, and meanwhile, the energy consumption of the system is reduced.

The unmanned aerial vehicle technology can be used for assisting in vehicle edge calculation due to flexibility and mobility of the unmanned aerial vehicle. When sudden traffic conditions or large sudden flow are met, the vehicle edge computing architecture is difficult to support various task requirements. At this time, the unmanned aerial vehicle with the edge calculation server nearby is dispatched to the burst flow road section, and effective handling can be achieved. Therefore, in the prior art, the unmanned aerial vehicle technology is also often used as an auxiliary technology for the vehicle edge calculation.

In various applications and services of the internet of vehicles, vehicles often have certain requirements on freshness of obtained information. In an automatic driving scene, task calculation results obtained by a vehicle need to timely, accurately and efficiently reflect external conditions, and the difference between the task calculation results and the external real conditions is small, namely, fresh calculation results need to be obtained. In recent years, information age is regarded as a method for measuring information freshness, defined as the time elapsed since information generation, and much research is currently conducted on focusing this freshness on the computation uninstallation process to characterize the freshness of the results obtained by the computation uninstallation process.

With regard to vehicle edge calculation research in an autonomous driving scenario, it is a primary objective to ensure safety during autonomous driving. In order to ensure the safety of the vehicle, the vehicle needs to frequently sense the environmental information, and interacts with the environment in a large amount and frequently, and meanwhile, the vehicle can continuously and intensively generate calculation tasks. This is a challenge for conventional car networking architectures. The existing technology ensures the safety and user experience of automatic driving by ensuring the time delay of the calculation unloading process. In actual situations, whether a calculation result obtained by the automatic driving vehicle can timely and accurately reflect the external real situation is represented by using the index of time delay, which is obviously not powerful enough. Therefore, the invention uses the information age to represent the difference between the calculated result obtained by the vehicle and the actual external situation. Meanwhile, the conventional calculation unloading strategy considering the information age usually focuses only on the calculation unloading process, and ignores the information freshness of the used cached calculation model. Aiming at the defect, the information freshness of the cached content is also considered into the freshness of the calculation result, so that the freshness of the information result obtained by the vehicle is comprehensively ensured.

Disclosure of Invention

The invention provides a strategy for solving the problems of minimized system energy consumption and guaranteed information timeliness by making decisions on cache updating, user association and resource allocation in a dynamic environment and combining cache and unmanned aerial vehicle technologies aiming at the problems of high requirements on safety and timeliness, high vehicle mobility, dense tasks, quick network topology change and high system overhead in a cache-assisted unmanned aerial vehicle-vehicle edge computing architecture and an automatic driving scene, and provides a state updating method for efficient cache and task unloading in an unmanned aerial vehicle-assisted internet.

In order to achieve the above purpose, the invention provides the following technical scheme:

a state updating method for efficient caching and task unloading in an unmanned aerial vehicle-assisted Internet of vehicles is characterized in that a system model comprises a macro base station provided with an MEC server, an unmanned aerial vehicle with computing capability and a plurality of vehicles provided with computing capability, and the method comprises the following steps:

s1, calculating the freshness of the data used by the tasks and the freshness of the unloading process, and taking the freshness as the freshness of the state update of the vehicle;

s2, minimizing the total energy consumption of the system in the time slot t, wherein the total energy consumption comprises the energy consumption generated by updating the cache content in the time slot t and the energy consumption generated by processing the vehicle task in the time slot t;

s3, dividing the experience cache pool by an algorithm based on the depth certainty strategy gradient, and selecting an experience training neural network from two different cache pools according to the proportion.

Further, the method for calculating the freshness of the cache model in step S1 is as follows:

definition of

And

indicating whether vehicle i and UAV j have cache content w in time slot t,

and

indicating that there is a buffer w, otherwise 0, vehicleAnd the cache capacity of the UAV is expressed as:

wherein,

and

representing buffer capacity, l, of vehicle i and UAV j_wA data amount indicating the content w;

definition of

And

two binary variables, representing whether the vehicle i and the UAV j update or replace the buffer w at the time slot t,

and

indicating that the cache w is updated or replaced at the time slot t, and otherwise, the cache w is 0; for vehicle i:

similarly, for UAV j, we can get:

and

respectively representing the timeliness, namely the freshness of the cache, of the cache contents of the vehicle and the UAV; assuming that when the content w is replaced by other kinds of cache data for a certain time, the freshness is set to an infinite number I, we get:

further, the calculation method of the freshness of the uninstallation process in step S1 is:

the vehicle generates a computing task and sends a resource access request to the MeNB, the request including { e ] of the task w_i(t)，s_w，z_w-direction of travel, -speed of travel, -current position information, where e_i(t) indicates the type of task of vehicle i in time slot t, s_wIndicates the size of the task w, z_wRepresenting the CPU period required by the calculation of the secondary task w; after the MeNB collects this information, the MeNB makes decisions on the handling of the vehicle i task, to use

Indicating whether the task is processed locally, by

Expressed in several ways:

1) local calculation:

2) unloading to a UAV:

3) offloading to MeNB:

4) calculating for the moment:

then there are:

further, when calculating locally, use

The calculation capability of the vehicle i is shown, and the local calculation time delay of the vehicle i is shown as follows:

the corresponding energy consumption is expressed as:

weight factor mu_iRepresents the energy consumption required by the CPU of vehicle i per cycle calculation, expressed as:

the locally computed delay does not exceed one slot τ, with:

further, when unloaded to UAV, the transmission rate between vehicle i and UAV j is expressed as:

wherein, b_i，j(t) is the bandwidth of the vehicle i when communicating with UAV j, MeNB, p is the vehicle transmit power to mission, β_i，j(t) is the channel gain, σ, for vehicle i in communication with UAV j²Is the power spectral density of white noise;

the channel gain between vehicle i and UAV j is expressed as:

wherein d is_i，j(t) represents the distance between vehicle i and UAV j;

establishing a coordinate axis on the region

To represent the x, y coordinates of the position of the vehicle, UAV, and the velocity of the vehicle and UAV as

The distance between the directional vehicle i and UAV j, with positive and negative indicating travel, is expressed as:

if the mission of vehicle i is offloaded to UAV j at time t, i must be within the coverage of UAV j in one time slot, i.e., the connection distance between vehicle i and UAV j must be within one time slot, i.e., the next time slot starts at a time greater than the communication radius R of UAV, then there are:

further, when unloading to MeNB, the transfer rate between vehicle i and MeNB:

wherein, b_i，M+1(t) is the bandwidth of the communication between vehicle i and MeNB, g_i，M+1(t) is the square of the average channel gain between vehicle i and MeNB, the channel gain between vehicle i and MeNB:

the time at which vehicle i transmits the task to the UAVj or MeNB is expressed as:

the MEC server in UAV and MeNB calculates the time delay for the task of vehicle i as:

wherein,

representing the computing resources allocated by the vehicle i in the MEC server j in the time slot t;

the total time delay for unloading is:

the total offload latency specification is limited to one time slot:

energy consumption during unloading:

definition of

Indicating the freshness of the status update of the vehicle i, i.e. the age of the status update, at the time slot t

Indicating the time when task w (w is the first to be processed task in the task queue) occurred, so the age of the state update is expressed as:

provision not to exceed threshold A_th：

Further, the energy consumption ξ (t) resulting from buffering updates in the time slot t is expressed as:

wherein θ (J/bit) is a weighting factor for converting the cache update data amount into energy, and represents the energy consumed by caching 1bit of data.

Energy consumption E caused by task processing in time slot t_i(t) is expressed as:

the energy consumption phi (t) of the system in time slot t due to task transmission is represented as:

total energy consumption of system in time slot t

Expressed as:

further, step S2 minimizes the system energy consumption for T slots, and this optimization problem is expressed as:

further, in step S3, the searched experience is stored in different experience buffer pools according to different qualities, and the buffer pool dividing method includes: finding out a threshold value for dividing better experience and worse experience, and dividing the experience cache pool into a better experience pool and a worse experience pool through the threshold value.

Further, in step S3, the minimization problem of step S2 is converted into an MDP problem, where { Sc, Ac, Tc, Rc } is defined by a tuple, Sc is a set of system states, Ac is a set of system actions, T is a set of system actions, and^c＝{p(s^c′|s^c，a^c) Is the set of transition probabilities, R^c：S^c×A^c→R^cIs the reward function, the strategy pi is the mapping of Sc to Sc, the MDP problem is defined as follows:

state space: in the time slot t, defining a set of system states as coordinates of the unmanned aerial vehicle and the vehicle, a task request type of each vehicle, cache states of the unmanned aerial vehicle and the vehicle, and an information age of a first task to be processed of the vehicle;

an action space: in the time slot t, the action space of the system is the correlation condition of the vehicle and each cache node, and whether the cache is updated or not with the existing cache;

the reward function: the reward function is set as the sum of the energy consumption and penalty functions of the system.

Compared with the prior art, the invention has the beneficial effects that:

the traditional task unloading strategy focuses only on the calculation unloading process, and ignores the freshness of the cached data used by the calculation task. Aiming at the defect, the invention not only considers the freshness of information in the process of calculating unloading, but also considers the freshness of cached data used in the task calculation into the freshness of the vehicle state update so as to comprehensively ensure the freshness of the vehicle state update.

In the aspect of system energy consumption, compared with the conventional automatic driving research, the method considers a green system, simultaneously considers the information freshness of the calculation result obtained by the vehicle and the energy consumption of the system, and designs a strategy for solving the problem of minimized system energy consumption and ensuring the timeliness of vehicle state updating by making decisions on cache updating, user association and resource allocation in a dynamic environment by combining three technologies of vehicle edge calculation, cache and unmanned aerial vehicle aiming at the problems of a cache-assisted edge calculation architecture, high requirements on safety and timeliness in an automatic driving scene, high vehicle mobility, intensive tasks, fast network topology change and high system overhead.

The invention considers the solving need of the problem, needs to use a deep reinforcement learning strategy and adopts a deep certainty strategy gradient (DDPG) algorithm as a basis. In the method, the state space and the action space are large, and the algorithm convergence is slow. Therefore, the traditional DDPG algorithm is improved, the threshold for dividing better experience and poorer experience is found out through a large number of experiments, the experience cache pool is divided into a better experience pool and a poorer experience pool through the threshold, and the experience training neural network is respectively selected from the two experience pools according to the proportion during training. Simulation shows that the improved algorithm has faster convergence speed, higher reward value after convergence and smaller oscillation.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a system model of a state updating method for efficient caching and task unloading in an unmanned aerial vehicle-assisted internet of vehicles according to an embodiment of the present invention.

Detailed Description

The invention designs a state updating method for efficient caching and task unloading in an unmanned aerial vehicle-assisted internet of vehicles, a system model is shown in figure 1, a region in a city is considered, in the region, a road is a two-lane bidirectional driving road section, and a Macro base station (Macro eNodeB, MeNB) provided with an MEC server provides full signal coverage for the whole road section of the region. Unmanned Aerial Vehicles (UAVs) with computing power are arranged above the road, and the number of the UAVs is M. Each vehicle (all equipped with MEC server) can periodically generate different types of calculation tasks, the number of vehicles is N, and the tasks are W types in total. The density of vehicles in the area and the speed of travel are random to simulate the reality of vehicle travel in a city. In this area, the MeNB collects information and makes decisions. The time slot system is considered in the present embodiment.

1. Cache update policy

In the present invention, an MEC server can only be used to process the corresponding task if the corresponding content is already cached on the MEC server. Assume that the MeNB caches all content and keeps updating to ensure that the content is fresh and fresh here. UAV and vehicle have limited cache capacity. The buffer update process may be completed within one time slot. Consider T slots.

Definition of

And

indicating whether vehicle i and UAV j have cache content w in time slot t,

and

with a buffer w, otherwise 0, the buffer capacity of the vehicle and UAV is expressed as:

wherein,

and

representing buffer capacity, l, of vehicle i and UAV j_wIndicating the amount of data of the content w.

Definition of

And

and

indicating that the buffer w is updated or replaced at time slot t, otherwise it is 0. Due to the limited buffer capacity of the vehicle and the UAV, the obsolete buffer data will be updated to an updated version or replaced with other content. Obviously, if the vehicle or UAV does not cache w before the time slot t, the cached data w cannot be updated in the time slot t, so we can get:

similarly, for UAV j, we can get:

in the invention, the information timeliness of the calculation result is related to the timeliness of a cache model used for calculating the vehicle task and the transmission and calculation processes.

And

the timeliness, i.e. freshness, of the cached content of the vehicle and UAV, respectively. It is assumed that when the content w is replaced with other kinds of cache data for a certain time, the freshness is set to an infinite number I. So we can get:

2. compute offload policy

Once the vehicle has generated the computing task, it sends a resource access request to the MeNB, the request including { e ] for task w_i(t)，s_w，z_wInformation such as driving direction, driving speed, current position, etc. e.g. of the type_i(t) indicates the type of task of vehicle i in time slot t, s_wRepresents the size of the task w, z_wRepresenting the CPU period required by the calculation of the secondary task; after the MeNB collects this information, the MeNB makes decisions on the handling of the vehicle i task, to use

Indicating whether the task is processed locally, by

Expressed in several ways:

1) local calculation:

2) unloading to a UAV:

3) offloading to MeNB:

4) temporarily, not calculating:

then there are:

2.1 local calculation:

by using

the corresponding energy consumption is as follows:

weight factor mu_iRepresenting the energy consumption required by the CPU of vehicle i per cycle calculation, there are:

the delay of the local computation is limited in this study to not exceed one slot τ. Comprises the following steps:

2.2 task offloading

In case of task offloading, the MEC server in the UAV or MeNB performs the computational task.

(1) Unloading to a UAV:

when a vehicle transmits a computational task to a UAV, other UAVs are interfered with by the transmission process. Thus, the transmission rate between vehicle i and UAV j may be expressed as:

wherein, b_i，j(t) is the bandwidth of the vehicle i in communication with UAV j, MeNB, p is the transmit power the vehicle will be tasked with, β_i，j(t) is the channel gain, σ, for vehicle i in communication with UAV j²Is the power spectral density of white noise.

The channel gain between vehicle i and UAV j may be expressed as:

wherein d is_i，j(t) represents the distance between vehicle i and UAV j. Establishing a coordinate axis on the region

(2) offloading to MeNB:

transmission rate between vehicle i and MeNB:

wherein, b_i，M+1(t) bandwidth when vehicle i communicates with MeNB, p transmission power when vehicle i transmits tasks to MeNB, g_i，M+1(t) is the square of the average channel gain between vehicle i and MeNB.

The time when vehicle i transmits task w to UAV j or MeNB may be expressed as:

the time delay for computing the mission of vehicle i by the MEC server in UAV and MeNB may be expressed as:

wherein,

indicating the computing resources allocated in MEC server j in time slot t by vehicle i. Thus, the total time delay to unload is:

the offload delay specification is limited to one time slot:

energy consumption during unloading:

definition of

Indicating the freshness of the status update of the vehicle i, i.e., the age of the status update, at the time slot t. By using

Represents the time when task w was generated, so the age of the state update can be expressed as:

provision not to exceed threshold A_th：

3. System object

The energy consumption of the system comes from two aspects: on the one hand, the updating of the cache contents and, on the other hand, the processing of the vehicle tasks. Buffering the energy consumption xi (t) generated by the update in the time slot t:

wherein, θ (J/bit) is a weighting factor for converting the amount of the cache update data into energy, and represents the energy required to be consumed by caching 1bit of data.

Energy consumption E generated by task processing in time slot t_i(t))：

in summary, the system consumes the total energy in the time slot t

The aim of the invention is to minimize the system energy consumption of T time slots while satisfying various constraints, and this optimization problem can be expressed as:

s.t. status update age limit: (3)(4)(5)(6)(22)(23)

And (4) cache limitation: (1)(2)(24)

Task processing constraints: (7) - (21)(25)(26)(27)

4. Improved depth deterministic strategy gradient algorithm

In the system object of the present invention,

and

is a discrete variable, bandwidth association b_i，j(t) is a continuous variable. The problem is complex, belongs to the category of MINLP, and is difficult to deal with. Aiming at the problems of continuous generation of tasks, high maneuverability and fast change of channel conditions in the Internet of vehicles environment, a fast decision algorithm based on deep reinforcement learning is proposed.

Therefore, the present invention converts the problem to be solved by the system target into an MDP problem, which is defined by a tuple { Sc, Ac, Tc, Rc }, where Sc is a set of system states, Ac is a set of system actions, and T is a set of system actions^c＝{p(s^c′|s^c，a^c) Is a set of transition probabilities, and R^c：S^c×A^c→R^cIs a reward function. Strategy pi is the Sc to Sc mapping, so the MDP problem is defined as follows:

(1) state space: in the time slot t, a set of system states is defined as coordinates of the unmanned aerial vehicle and the vehicle, a task request type of each vehicle, cache states of the unmanned aerial vehicle and the vehicle, and an information age of a first task to be processed of the vehicle.

(2) An action space: in the time slot t, the action space of the system is the association condition of the vehicle and each cache node, and whether the cache is updated or not with the existing cache.

(3) The reward function: the reward function is set as the sum of the energy consumption and penalty functions of the system. The penalty function is to prevent the size of the buffered data from exceeding the buffer capacity and if it is determined that the vehicle task is offloaded to the drone, the vehicle exceeds a time slot within the communication range of the drone.

Due to the randomness of the system, the state transition probability is difficult to model. Therefore, we use a model-free reinforcement learning algorithm based on a Deep Deterministic Policy Gradient (DDPG) algorithm to learn and update the computational resource allocation strategy. Unlike the conventional DDPG algorithm which randomly samples data during training without considering data quality, in the present invention, the explored experience is stored in different experience buffer pools according to the quality difference. Then, the poor experience and the good experience are randomly selected in proportion within a certain step number, the correlation among data is eliminated, the stability of neural network training is improved, and oscillation is reduced.

In the method, firstly, in order to achieve the aim of comprehensively ensuring the safety of automatic driving, the freshness of a cache model used for calculation and the freshness of a calculation unloading process are simultaneously considered and used as the information freshness of a calculation result obtained by a vehicle. This consideration is more realistic than previous studies.

Second, in an autonomous driving scenario, the vehicle needs to frequently interact with the external environment to sense the external environment. Therefore, the invention mainly aims at the problems of high requirements on safety and timeliness, high vehicle mobility, dense tasks, fast network topology change and high system overhead in a cache-assisted edge computing architecture and an automatic driving scene, and designs a strategy for minimizing system energy consumption and ensuring information timeliness by making decisions on cache updating, user association and resource allocation in a dynamic environment by combining three technologies of vehicle edge computing, caching and unmanned aerial vehicles.

Finally, the problem exists in the form of mixed integer nonlinear programming (MINLP), which is a troublesome problem. Therefore, the method uses a deep reinforcement learning algorithm, adopts an algorithm based on a deep certainty strategy gradient, divides the experience cache pools, and selects the experience training neural network from two different cache pools in proportion, thereby accelerating the convergence speed, reducing the reward oscillation after the convergence, improving the reward value after the convergence and improving the algorithm performance.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A state updating method for efficient caching and task unloading in an unmanned aerial vehicle-assisted Internet of vehicles is characterized in that a system model comprises a macro base station provided with an MEC server, an unmanned aerial vehicle with computing capability and a plurality of vehicles provided with computing capability, and the method comprises the following steps:

s1, calculating the freshness of the cache model and the freshness of the unloading process as the freshness of the vehicle state update;

2. The method for updating the state of efficient caching and task offloading in the network of unmanned aerial vehicles as claimed in claim 1, wherein the step S1 is a method for calculating freshness of the cache model:

definition of

And

indicating whether vehicle i and UAV j have cache content w in time slot t,

and

indicating that there is a cachew, otherwise 0, the buffer capacity of the vehicle and UAV is expressed as:

wherein,

and

definition of

And

and

for UAV j:

and

3. the method for updating the status of efficient caching and task offloading in the drone-assisted internet of vehicles according to claim 1, wherein the method for calculating the freshness of the offloading process in step S1 is:

the vehicle generates a calculation task and sends a resource access request to the MeNB, wherein the request comprises { e ] of the task w_i(t)，s_w，z_w-direction of travel, -speed of travel, -current position information, where e_i(t) indicates the type of task of vehicle i in time slot t, s_wIndicates the size of the task w, z_wRepresenting the CPU period required by the calculation of the secondary task w; after the MeNB collects this information, the MeNB makes decisions on the handling of the vehicle i task, to use

Indicating whether the task is processed locally, by

Expressed in several ways:

1) local calculation:

2) unloading to a UAV:

3) offloading to MeNB:

4) temporarily, not calculating:

then there are:

4. the method for efficiently caching and task-off status updates in the internet of vehicles assisted by drones as claimed in claim 3, wherein the local computation is performed by

the corresponding energy consumption is expressed as:

the locally computed delay does not exceed one slot τ, with:

5. the method for efficiently caching and task off-loading in an unmanned aerial vehicle-assisted internet of vehicles according to claim 3, wherein the transfer rate between vehicle i and UAV j when off-loading to UAV is expressed as:

the channel gain between vehicle i and UAV j is expressed as:

wherein d is_i，j(t) represents the distance between vehicle i and UAV j;

establishing a coordinate axis on the region

To represent the x, y coordinates of the position of the vehicle, UAV, the speed of the vehicle and UAVIs composed of

r represents the communications coverage radius of the UAV.

6. The method for status update of efficient caching and task offloading in unmanned aerial vehicle-assisted internet of vehicles according to claim 3, wherein when offloading to MeNB, the transfer rate between vehicle i and MeNB is:

wherein, b_i，M+1(t) is the bandwidth of the communication between vehicle i and MeNB, g_i，M+1(t) is the square of the average channel gain between vehicle i and MeNB, g_i，M+1(t) is the channel gain between vehicle i and MeNB:

the time at which vehicle i transmits task w to UAV j or MeNB is expressed as:

the time delay for computing the mission of vehicle i by MEC server in UAV and MeNB is expressed as:

wherein,

the total time delay for unloading is:

the offload delay specification is limited to one time slot:

energy consumption during unloading:

definition of

Representing the time when task w is generated, the age of the state update is represented as:

provision not to exceed threshold A_th：

7. The status updating method for efficient caching and task unloading in unmanned aerial vehicle-assisted internet of vehicles according to claim 1, wherein energy consumption ξ (t) generated by caching updating in time slot t is expressed as:

wherein, θ (J/bit) is a weight factor for converting the cache updating data amount into energy, and represents the energy consumed by caching 1bit of data;

total energy consumption of system in time slot t

Expressed as:

8. the method for updating status of efficient caching and task offloading in unmanned aerial vehicle-assisted internet of vehicles according to claim 1, wherein step S2 minimizes system energy consumption for T timeslots, and the optimization problem is expressed as:

9. the method for updating the status of efficient caching and task offloading in the unmanned aerial vehicle-assisted internet of vehicles according to claim 1, wherein in step S3, the explored experience is stored in different experience buffer pools according to different qualities, and the buffer pool division method comprises: finding out a threshold value for dividing better experience and worse experience, and dividing the experience cache pool into a better experience pool and a worse experience pool through the threshold value.

10. The UAV-assisted IOV state updating method of claim 1, wherein in step S3, the minimization of step S2 is converted into an MDP problem, where a tuple defines { Sc, Ac, Tc, Rc }, where Sc is a set of system states, Ac system actions, T _ Systeme _ action, and T _ Systeme _ action^c＝{p(s^c′|s^c，a^c) Is the set of transition probabilities, R^c：S^c×A^c→R^cIs the reward function, strategy pi is the Sc to Sc mapping, and the MDP problem is defined as follows: