CN113326076A

CN113326076A - Vehicle-mounted fog-assisted vehicle fleet task unloading method based on semi-Markov decision process

Info

Publication number: CN113326076A
Application number: CN202110594462.0A
Authority: CN
Inventors: 吴琼; 王思远; 葛红梅
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-31
Anticipated expiration: 2041-05-28
Also published as: CN113326076B

Abstract

The invention relates to a vehicle-mounted fog-assisted vehicle fleet task unloading method based on a semi-Markov decision process. The invention simultaneously considers factors such as sending time delay, calculating time delay and the like in task unloading and establishes a task unloading model based on a semi-Markov decision process. Then, a system state set and an action set are respectively defined, a system state transition probability formula and a system reward function are deduced, and then an SMDP model is solved by using a value iterative algorithm based on a Bellman equation to obtain an optimal task unloading strategy. The scheme has moderate computational complexity and reasonable system model, and fully considers how the tasks are distributed and various time delays involved in the task unloading process. Simulation results show that the scheme can obtain larger long-term benefits of the system on the premise of ensuring the task unloading time delay.

Description

Vehicle-mounted fog-assisted vehicle fleet task unloading method based on semi-Markov decision process

Technical Field

The invention relates to the technical field of vehicle-mounted task unloading, in particular to a vehicle-mounted fog-assisted vehicle fleet task unloading method based on a semi-Markov decision process.

Background

Along with the development of science and technology, unmanned motorcades are more and more popularized, so that the safety of roads is improved. Wherein each fleet consists of a head vehicle and several member vehicles. Specifically, the head vehicle controls the speed, acceleration, and direction of travel of the entire fleet, and the member vehicles travel on the same lane one by one following the head vehicle at the same speed. Unmanned vehicles are typically equipped with cameras, radars, etc., which produce a large amount of redundant data. At this time, the vehicle is required to calculate and analyze the data in order to extract useful information. From Intel corporation's data, a vehicle needs to compute, analyze and fuse large amounts of sensor data (about 1GB/s) in order to be able to make safety decisions. However, the computing power of a vehicle is limited, and the vehicle may be processing other tasks when a new task arrives, so that the computing resources of the vehicle are occupied, and the task needs to be unloaded to other vehicles in the fleet for cooperative processing, and then the result of the computation is returned to the vehicle. However, because the number of vehicles in the fleet is limited, the resource shortage of the fleet can occur, and at this time, the task cannot be processed in time, so that the processing result cannot be received in time, and the vehicles cannot make further safety decisions.

A vehicle-mounted fog computing (VFC) system can provide powerful computing resources to solve the above problems. VFC systems consist of a number of vehicles in the road, each of which is capable of providing computing resources to assist in processing tasks. When the resource of the fleet is short, the vehicles in the fleet can select to transmit the task to the head vehicle, the head vehicle unloads the task to the vehicles running in the VFC system beside the fleet for cooperation processing, then the vehicles in the VFC system return the calculation result to the head vehicle, and the head vehicle forwards the calculation result to the corresponding vehicle. The VFC system can provide various low latency and real-time applications to vehicles in a fleet of vehicles when fleet resources are tight, thereby greatly increasing network capacity. In the task unloading process, the communication modes of the vehicles all adopt an IEEE 802.11p distributed coordination function. For an automatic driving vehicle, the time delay of task unloading is of great importance, and the fact that the time delay is too long means that the vehicle can receive the calculation result after the task is unloaded, so that the vehicle cannot timely make corresponding reactions according to the calculation result, and the probability of safety accidents is greatly increased. The offload delay includes transmit delay, compute delay, and backhaul delay. The sending delay is the time occupied by the sending task, the calculating delay is the time occupied by the calculating resource processing task, and the returning delay is the time occupied by the feedback calculating result.

In a fleet of vehicles, there are typically sufficient computing resources so that one vehicle is sufficient to handle one computing task. However, each vehicle in the fleet has different computing resources, and therefore, the computing time delay generated when different vehicles process tasks is different.

In on-vehicle fog, the vehicle is generally composed of a private car. The computational resources of each vehicle are limited and approximately the same size. The present invention selects one or more vehicles for processing when offloading the task to VFC processing. Specifically, the head vehicle divides the task into several subtasks with the same size, and the 802.11pDCF protocol is adopted to transmit the subtasks to the corresponding several vehicles in the VFC, respectively. Due to the fact that the number of vehicles for processing unloading tasks is increased, the sending time delay caused by the fact that the head vehicle unloads the sub tasks to the corresponding sub tasks in the VFC is also increased, and meanwhile due to the fact that the computing resources for processing the tasks are increased, the corresponding computing time delay is reduced. And the vehicles in the VFC arrive and leave randomly, which causes the resources in the VFC to change dynamically.

In resource-limited fleets and VFCs, how to consider the selection of vehicles in the fleet to handle the task, the number of vehicles in the VFC to handle the task, and the random arrival and departure of the vehicles in the VFC, maximizing the long-term return in the system is an important issue. Where the return is related to the transmission delay, the computation delay and the return delay.

At present, no vehicle-mounted fog computing system for optimizing fleet task unloading under the 802.11p DCF protocol is researched.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the problems of limited resources in a fleet, dynamic change of resources in vehicle-mounted fog, diversity of tasks, time delay of task unloading and the like in the prior art.

In order to solve the technical problems, the invention provides a vehicle-mounted fog-assisted vehicle fleet task unloading method based on a semi-Markov decision process, which comprises the following steps:

step S1: defining a state set and an action set of an on-vehicle fog-assisted fleet task unloading system based on a semi-Markov decision process;

step S2: obtaining the current state s and the action a(s) of the system according to the state set and the action set of the system, and calculating the state transition probability of the system according to the current state s and the action a(s) of the system;

step S3: establishing a Bellman equation according to the system reward and the system state transition probability and solving an optimal unloading strategy in the system;

the system reward is the system reward for the system to take action a(s) in state s, expressed as the difference between the immediate gain of the system and the cost consumed during the transition of the system state, calculated as follows:

R(s,a)＝U(s,a)-G(s,a)

wherein U (s, a) represents the immediate gain of the system after the system takes action a(s) in state s; g (s, a) represents the consumption of the system during the transition from the current state to the next state.

In one embodiment of the present invention, in step S2, the method for calculating the transition probability includes: and respectively calculating and normalizing the transition probability when the fleet sends a task request, the transition probability when the task processed by the vehicle leaves the system, the transition probability when the task processed by the computing unit in the vehicle-mounted fog leaves the system, the transition probability when the vehicle arrives at the system and the transition probability when the vehicle leaves the system.

In one embodiment of the present invention, in step S3, the expression of the immediate gain is:

where eta represents the unit price of time, E_lIndicating the time required for processing by the task requester itself, T_pRepresenting the transmission time of transmission tasks within a vehicle fleet, each task requiring d computing resources, vehicle V of the vehicle fleet_iThe computational resource of (i ═ 1.., N) is f_i，

And the penalty parameter is zeta, and the penalty parameter represents the transmission time for transmitting the task to j calculation units in the vehicle-mounted fog by the head vehicle for common processing.

In one embodiment of the present invention, the T is_pAnd

having the same formula of calculation, using T_trRepresents:

T_tr＝θ·E_tr·T_slot

where θ represents the number of tasks that the vehicle needs to transmit, and its value is constant at 1 in the fleet, and in the network consisting of the first vehicle and the vehicle-mounted fog, its value depends on the decision of the system, when

Dividing the task into j subtasks with the same size by the head vehicle and respectively transmitting the j subtasks to j vehicles in the vehicle-mounted fog, wherein the value of theta is j; e_trRepresenting the average number of time slots required for transmitting the tasks; t is_slotRepresenting the average duration of each slot.

In one embodiment of the invention, the average duration T of each time slot_slotThe expression is as follows:

T_slot＝q_idle·T_idle+q_s·T_s+q_c·T_c

wherein

Indicating the probability that the time slot is free,

indicating the probability that the slot is in successful transmission of data, q_c＝1-q_idle-q_sIndicates the probability, T, that the slot is in data collision_idleIndicating the duration, T, of each free time slot_s＝Header+E[P]The value of/theta + SIFS + delta + ACK + DIFS + delta indicates that the time slot is in a data transmission state successfullyDuration of (D), T_c＝Header+E[P]The/theta + SIFS + delta + ACKtimeout represents the time length of the time slot in the data collision state; wherein Header is PHY_h+MAC_hIndicating the header length of the data packet, wherein the value of the header length of the data packet is equal to the sum of the header lengths of the physical layer and the medium access layer of the data packet; e [ P ]]Representing the length of the task; δ represents the propagation delay; SIFS, ACK, and DIFS indicate the lengths of SIFS, ACK, and DIFS, respectively. And is

Indicating the probability of a collision occurring with the vehicle transmitting data.

In one embodiment of the invention, the average number of time slots E required for the transmission of a task_trThe expression of (a) is:

where m is the number of retransmissions, q is the collision probability, W_minIs the minimum contention window.

In one embodiment of the present invention, in step S3, the consumption of the system is:

where α represents the continuous time discounting factor, C (s, a) is the number of vehicles in the system that are processing the task after taking action in the current state, and its expression is:

β (x, a) represents the sum of all event arrival rates at the next moment in the system, and the calculation expression of β (x, a) is:

in one embodiment of the present invention, the bellman optimal equation is expressed as:

wherein

Represents the normalized system award after the normalization,

the normalized discount factor is represented by the number of discount units,

expressing the normalized transition probability, introducing a constant, wherein N is the number of vehicles in the fleet, and the arrival rate of the event A at the next moment is Nlambda_pEvent F₊₁And F_-1Respectively, is lambda_vAnd mu_v，f_iFor vehicles V in a vehicle fleet_iM represents the number of computing units in the vehicle-mounted fog, N_RRepresenting the number of computing units to which a single task can be maximally allocated in the on-board fog, f_vIs the computational resource of each vehicle in the vehicle fog, and d is the computational resource required for each computational task.

In an embodiment of the present invention, the solving of the optimal unloading strategy in the vehicle-mounted fog-assisted vehicle fleet task unloading system includes: and setting a positive number epsilon, iterating each state in the system based on a Bellman optimal equation until the difference value of the absolute value of the state cost function of each system is smaller than a set value after two adjacent iterations, and stopping iteration.

In an embodiment of the present invention, the optimal offload policy calculation expression is:

compared with the prior art, the technical scheme of the invention has the following advantages:

the invention provides a vehicle-mounted fog-assisted vehicle fleet task unloading scheme based on a semi-Markov decision process, aiming at the problems of limited vehicle fleet internal resources, dynamic change of vehicle-mounted fog internal resources, diversity of tasks, task unloading time delay and the like. Firstly, an IEEE 802.11p protocol is used for transmitting tasks, and factors such as sending time delay, calculation time delay and the like in task unloading are considered, so that a task unloading model based on a semi-Markov decision process is established. Then, a system state set and an action set are respectively defined, a system state transition probability formula and a system reward function are deduced, and then an SMDP model is solved by using a value iterative algorithm based on a Bellman equation to obtain an optimal task unloading strategy. The method has moderate computational complexity and reasonable system model, and fully considers how the tasks are distributed and various time delays involved in the task unloading process. Simulation results show that the scheme can obtain larger long-term benefits of the system on the premise of ensuring the task unloading time delay.

Drawings

In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which

FIG. 1 is a system block diagram.

Fig. 2 is a graph of the long term benefit obtained for the inventive scheme compared to other schemes (λ p ═ 20).

Fig. 3 is a graph of the long term benefit obtained with the inventive scheme compared to other schemes (λ p ═ 13).

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

The invention discloses a vehicle-mounted fog-assisted vehicle fleet task unloading method based on a semi-Markov decision process, which comprises the following steps of:

step S1: a state set of the system is first defined. In the semi-Markov decision model, the system state comprises the processing task state of the vehicles in the fleet, the number of tasks processed by different numbers of vehicles in the vehicle-mounted fog computing system, the total number of vehicles in the vehicle-mounted fog and the current event. Figure 1 gives a system framework. The state set of the system is represented as follows:

wherein n is_iIndicating vehicle V_i(i ═ 1.. multidata., N) number of tasks processed, N_i0 denotes a vehicle V_iFor free vehicles, n_i1 denotes a vehicle V_iA task is being processed; b is_jIndicating the number of tasks commonly handled by the j computing units in the on-board fog, N_RThe number of computing units to which a single task can be allocated at most in the vehicle-mounted fog is represented; m represents the number of computing units in the vehicle-mounted fog and satisfies

e represents the currently occurring event.

Step S2: a set of actions of the system is defined. From the collection

The relationship between the event and the action is expressed as follows:

where-1 indicates that the current event is by vehicle V_iA processed task leaving system, a task leaving system processed by j computing units in the vehicle-mounted fog, and a vehicle V_iArrival and vehicle V_iWhen leaving, the system takes no task action; b represents events for generating computational tasks for a fleet of vehiclesA task, and system computing resources are scarce, the system refuses to process the task;

indicating fleet production of computing tasks, the system assigns the tasks to fleet vehicles V_iPerforming intermediate treatment;

indicating that the system transmits the task to j calculation units in the vehicle-mounted fog for processing.

Representing a set of events, N being the number of vehicles in the fleet, A representing the calculation tasks issued by the vehicles in the fleet, D_iRepresenting vehicles V in a fleet_iThe task of (i ═ 1.. multidot.n) processing is completed, L_j(j＝1,...,N_R) Indicating that the tasks jointly processed by the j computing units in the on-board fog leave the system, F₊₁And F_-1Respectively, vehicle arrival and departure from on-board fog.

Step S3: system state transition probabilities are defined. In the semi-Markov decision process, the transition probability P (x | s, a) is calculated in five cases based on the current state s and the action a(s). Wherein the values of action a(s) are selected from the set

And (4) selecting. The specific expression is as follows:

(1)

i.e. the fleet issues a task request.

The first case is when the fleet issues a task request and the system assigns the task to a fleet vehicle V_iAt this time, the total number of vehicles in the vehicle fleet is unchanged, and j (j 1.., N.) is added to the vehicle fog_R) The number of tasks processed by the computing units is not changed, and the arrival rate of the computing tasks sent by the vehicles in the event fleet at the next moment of the event A is Nlambda_pWhere N is the number of vehicles in the fleet, λ_pIs the task arrival rate. Event L_jHas an arrival rate of B_j·jf_vAnd d. Wherein B is_jFor the number of tasks collectively processed by the j computing units in the on-vehicle fog, f_vIs the computational resource of each vehicle in the vehicle fog, and d is the computational resource required for each computational task. Event F₊₁And F_-1Respectively, is lambda_vAnd mu_v. The arrival rate of the mission leaving the system handled by the event fleet vehicle needs to be discussed in two cases. Where E represents what happens at the next moment.

1)

The current event is a task generated by the motorcade and processed by the ith vehicle of the motorcade, then the vehicle V_iThe number of tasks processed is increased by one, so at the next instant the event vehicle V_iThe arrival rate of processing tasks leaving the system is (n)_i+1)·f_iAnd d. Wherein f is_iFor vehicles V in a vehicle fleet_iThe size of the computing resource of (2).

2)

The current event is a task generated by the motorcade and processed by the ith vehicle of the motorcade, then the vehicle V_iThe number of the processed tasks is increased by one, and the next event is the departure of the task processed by the kth vehicle in the motorcade, because the vehicle V_kIf the number of the processed tasks is not changed, the event D is carried out_kHas an arrival rate of n_k·f_k/d。

The second situation is that when the fleet sends out task request and the system distributes the task to j vehicles in the vehicle fog for common processing, the total number of vehicles in the fleet is not changed and the number of processed tasks is not changed, the arrival rate of the event A at the next moment is Nlambda_pEvent D_iAn arrival rate of (i ═ 1.., N) is N_i·f_iAnd d. Event F₊₁And F_-1Respectively, is lambda_vAnd mu_v. The event is given by j (j 1.., N) in the vehicle fog_R) Tasks processed by a plurality of computing unitsThe rate of arrival of the departure needs to be discussed in two cases.

1)

This indicates that the system distributes tasks to j vehicles in the on-vehicle fog for common processing, and the number of tasks commonly processed by j vehicles in the on-vehicle fog is increased by one, so that the event L is processed at the next time_jHas an arrival rate of (B)_j+1)·jf_v/d。

2)

This case indicates that the event occurring at the next time is a departure of a task commonly handled by m vehicles in the on-vehicle fog, and since the number of tasks commonly handled by m vehicles is not changed, the event L is generated at the next time_mHas an arrival rate of B_m·mf_v/d。

In summary, given a system state

And an act a(s), where the expression of the system state transition probability is:

(2)

i.e. vehicle V_iThe processed task leaves the system, and at the moment, the expression of the system state transition probability is as follows:

(3)

i.e. j in the vehicle fogThe tasks processed by the computing units jointly leave the system, and the expression of the system state transition probability is as follows:

(4)

i.e. vehicle V_iAnd when the system arrives, the expression of the system state transition probability is as follows:

(5)

i.e. vehicle V_iLeaving the system, wherein the expression of the system state transition probability is as follows:

where β (x, a) represents the sum of all event arrival rates at the next time in the system. The transition probability of an event is the ratio of the arrival rate of the event at the next time to the arrival rates of all events at the next time. In summary, under different system states s and actions a(s), the calculation expression of β (x, a) is:

step S4: a reward model for the system is defined. The system reward for taking action a(s) at state s may be expressed as the difference between the immediate gain of the system and the cost expended during the transition to the system state. Is represented as follows:

R(s,a)＝U(s,a)-G(s,a)

where U (s, a) represents the immediate gain of the system after taking action a(s) in the current state; g (s, a) represents the consumption of the system during the transition from the current state to the next state.

The calculation of the immediate gain can be divided into the following:

(1) when the value of e is equal to that of A,

1, when N:

vehicle V when requesting a vehicle to issue a computational task and the system decides to offload it to the fleet_iWhen the vehicle is requested to carry out in-line communication through the DCF function of the 802.11p protocol during processing, the calculation task is transmitted to the specified vehicle, and the immediate gain of the system is eta (E)_l-T_p-d/f_i). Where eta represents the unit price of time, E_lIndicating the time required for processing by the task requester itself, T_pRepresenting the transmission time of transmission tasks within the fleet, d is the computational resource required for each computational task issued by a requesting vehicle, V is the fleet vehicle_iThe computational resource of (i ═ 1.., N) is f_i。

(2) When the value of e is equal to that of A,

j＝1,...,N_Rthe method comprises the following steps:

when the system determines that j calculation units in the vehicle-mounted fog process the tasks, the vehicle is requested to transmit the tasks to a head vehicle through the communication in a vehicle fleet, the head vehicle divides the tasks into j subtasks with the same size, and then the subtasks are respectively transmitted to the j calculation units in the vehicle-mounted fog, so that the immediate gain of the system is equal to

Wherein

And the transmission time for transmitting the task to j calculation units in the vehicle-mounted fog by the head vehicle for processing together is shown.

(3) When e is a, a is b:

when the fleet of vehicles sends out a calculation task and the calculation resources in the system are scarce, the system refuses to process the task and is punished, and the punishment parameter is zeta.

(4) When in use

The method comprises the following steps:

when the task processed by the vehicle in the fleet leaves the vehicle, the task processed by the vehicle in the vehicle-mounted fog leaves and the vehicle arrives, the system does not take the task action, and the immediate benefit is zero.

(5) When e ═ F_-1,

The method comprises the following steps:

when the idle vehicle leaves in the vehicle-mounted fog, the system does not take task action, and the immediate benefit is zero.

(6) When e ═ F_-1,

The method comprises the following steps:

when a vehicle in the vehicle fog which is processing a task leaves, the task processing fails, the system is punished, and the parameter is zeta.

In summary, the expression of the system immediate gain is:

transmission time delay T_pAnd

further determinations are needed. Because the vehicles in the fleet adopt the DCF function of the 802.11p protocol to transmit tasks in a unicast mode, and the network formed by the head vehicle of the fleet and the mobile vehicle-mounted fog also adopts the DCF function of the 802.11p protocol to transmit tasks in a unicast mode, only the number of the vehicles in the two networks is different, the transmission delay T is_pAnd

has the advantages ofThe same calculation formula. With N_trIndicating the number of vehicles in the network, N for a fleet network_trA total number of vehicles N equal to the number of fleets; and for the network consisting of the first vehicle and the vehicle-mounted fog, the value is M + 1. Next, an expression of the network transmission delay will be derived, for convenience, using T_trRepresenting the propagation delay, but the formula applies to the propagation delay T_pAnd

network transmission delay T_trThe expression of (a) is:

T_tr＝θ·E_tr·T_slot

wherein theta represents the number of tasks required to be transmitted by the vehicle, and the value of theta is constant to be 1 in the vehicle fleet; in the network formed by the first vehicle and the vehicle-mounted fog, the value depends on the decision of the system

The head vehicle divides the task into j subtasks with the same size and respectively transmits the j subtasks to j vehicles in the vehicle-mounted fog, so that the value of theta is j; e_trRepresenting the average number of time slots required for transmitting the tasks; t is_slotRepresenting the average duration of each slot.

Average duration per time slot T_slotThe expression is as follows:

T_slot＝q_idle·T_idle+q_s·T_s+q_c·T_c

wherein

Indicating the probability that the time slot is free,

indicating the probability that the slot is in successful transmission of data, q_c＝1-q_idle-q_sIndicates the probability, T, that the slot is in data collision_idleIndicating the duration, T, of each free time slot_s＝Header+E[P]/θ+SIFS+δ+ACK+DIFS+ δ represents the duration of the time slot in the successful transmission data state, T_c＝Header+E[P]The/θ + SIFS + δ + ACKtimeout represents the duration of the slot in the data collision state. Wherein Header is PHY_h+MAC_hIndicating the header length of the data packet, wherein the value of the header length of the data packet is equal to the sum of the header lengths of the physical layer and the medium access layer of the data packet; e [ P ]]Representing the length of the task; δ represents the propagation delay; SIFS, ACK, and DIFS indicate the lengths of SIFS, ACK, and DIFS, respectively. And is

Average number of time slots E required for transmission of tasks_trThe expression of (a) is:

The system consumption G (s, a) is:

where α represents a continuous time discount factor, C (s, a) is the number of vehicles in the system that are processing the task after taking action in the current state, and the calculation expression is:

step S5: and solving an optimal unloading strategy in the vehicle-mounted fog-assisted vehicle fleet task unloading system. And solving the optimal unloading scheme by adopting a value iterative algorithm according to the Bellman equation.

The bellman optima equation can be expressed as:

wherein

Represents the normalized system award after the normalization,

the normalized discount factor is represented by the number of discount units,

and (4) expressing the normalized transition probability and introducing a constant.

And setting a very small positive number epsilon, iterating each state in the system based on the above Bellman optimal equation formula until the difference of the absolute value of the state cost function of each system is less than epsilon (1-gamma)/2 gamma after two adjacent iterations, and stopping iteration. At this time, each system state obtains the optimal strategy pi according to the following formula^*(s)。

The value iteration algorithm yields the pseudo-code as follows:

algorithm 1-value iterative algorithm

Step 1: for each system state S ∈ S, a state value function is initialized

Setting a very small positive number epsilon, and setting the iteration number k to be 0;

step 2: for each system state, a state cost function is calculated using the Bellman equation

Namely, it is

Step 3: if S ∈ S is satisfied for each system state

Entering Step 4; otherwise, adding 1 to the iteration number, namely k is k +1, and entering Step 2;

step 4: for each system state S E S, calculating an optimal unloading strategy pi_*(s) by the formula

The convergence error in the value iteration algorithm is:

fig. 2 and fig. 3 verify that the unloading scheme of the present invention has better performance than the unloading scheme based on the greedy algorithm under different task arrival rates, that is, the scheme of the present invention obtains more long-term benefits of the system.

According to the method, firstly, the tasks are transmitted by using an IEEE 802.11p protocol, and meanwhile, factors such as sending time delay, calculation time delay and the like in task unloading are considered, so that a task unloading model based on a semi-Markov decision process is established. Then, a system state set and an action set are respectively defined, a system state transition probability formula and a system reward function are deduced, and then an SMDP model is solved by using a value iterative algorithm based on a Bellman equation to obtain an optimal task unloading strategy. The method has moderate computational complexity and reasonable system model, and fully considers how the tasks are distributed and various time delays involved in the task unloading process. Simulation results show that the scheme can obtain larger long-term benefits of the system on the premise of ensuring the task unloading time delay.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

1. The vehicle-mounted fog-assisted vehicle fleet task unloading method based on the semi-Markov decision process is characterized by comprising the following steps of:

R(s,a)＝U(s,a)-G(s,a)

2. The vehicle-mounted fog-aided fleet task offloading method based on semi-markov decision process of claim 1, wherein in step S2, the transition probability is calculated by: and respectively calculating and normalizing the transition probability when the fleet sends a task request, the transition probability when the task processed by the vehicle leaves the system, the transition probability when the task processed by the computing unit in the vehicle-mounted fog leaves the system, the transition probability when the vehicle arrives at the system and the transition probability when the vehicle leaves the system.

3. The semi-markov decision process vehicle fog assisted fleet task offloading method of claim 1, wherein in step S3, said immediate gain is expressed as:

4. The semi-Markov decision process vehicle fog-assisted fleet task off-loading method of claim 3, wherein T is_pAnd

having the same formula of calculation, using T_trRepresents:

T_tr＝θ·E_tr·T_slot

Dividing the task into j subtasks with the same size by the head vehicle and respectively transmitting the j subtasks to j vehicles in the vehicle-mounted fog, wherein the value of theta is j; e_trPresentation renderingThe average time slot number required by the task transmission; t is_slotRepresenting the average duration of each slot.

5. The semi-Markov decision process vehicle fog-assisted fleet task offloading method of claim 4, wherein the average duration T of each timeslot_slotThe expression is as follows:

T_slot＝q_idle·T_idle+q_s·T_s+q_c·T_c

wherein

Indicating the probability that the time slot is free,

indicating the probability that the slot is in successful transmission of data, q_c＝1-q_idle-q_sIndicates the probability, T, that the slot is in data collision_idleIndicating the duration, T, of each free time slot_s＝Header+E[P]Theta + SIFS + delta + ACK + DIFS + delta represents the duration of the time slot in the state of successfully transmitting data, T_c＝Header+E[P]The/theta + SIFS + delta + ACKtimeout represents the time length of the time slot in the data collision state; wherein Header is PHY_h+MAC_hIndicating the header length of the data packet, wherein the value of the header length of the data packet is equal to the sum of the header lengths of the physical layer and the medium access layer of the data packet; e [ P ]]Representing the length of the task; δ represents the propagation delay; SIFS, ACK, and DIFS indicate the lengths of SIFS, ACK, and DIFS, respectively. And is

6. The semi-Markov decision process vehicle fog-assisted fleet vehicle driver of claim 4Traffic offloading method, characterized in that the average number of timeslots E required for the transmission of a task_trThe expression of (a) is:

7. The semi-markov decision process vehicle fog assisted fleet task offloading method of claim 1, wherein in step S3, the system consumption is:

8. the semi-markov decision process vehicle fog-assisted fleet task off-loading method of claim 7, wherein the bellman optimal equation is expressed as:

wherein

Represents the normalized system award after the normalization,

the normalized discount factor is represented by the number of discount units,

9. The semi-markov decision process vehicle fog assisted fleet task offloading method according to claim 1 or 8, wherein said solving for an optimal offloading strategy in a vehicle fog assisted fleet task offloading system comprises: and setting a positive number epsilon, iterating each state in the system based on a Bellman optimal equation until the difference value of the absolute value of the state cost function of each system is smaller than a set value after two adjacent iterations, and stopping iteration.

10. The semi-markov decision process vehicle-borne fog-aided fleet task offloading method of claim 9, wherein the optimal offloading policy calculation expression is: