CN113312105B

CN113312105B - Vehicle task part unloading strategy method based on Q learning

Info

Publication number: CN113312105B
Application number: CN202110619282.3A
Authority: CN
Inventors: 赵海涛; 韩哲; 王滨; 张晖; 倪艺洋; 朱洪波; 张峰; 王星
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd; Nanjing University of Posts and Telecommunications
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd; Nanjing University of Posts and Telecommunications
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2023-05-02
Anticipated expiration: 2041-06-03
Also published as: CN113312105A

Abstract

The invention relates to a vehicle task part unloading strategy method based on Q learning, which is applied to a vehicle-mounted self-organizing network and comprises the following steps: and for tasks requested by the mobile vehicle terminal, task classification is carried out, two extreme task types are eliminated, tasks with extremely sensitive time delay are directly locally unloaded, and tasks with large calculation resource amount are all unloaded to the MEC server for calculation. Secondly, defining task classification factors as beta for the rest businesses which are not easy to judge types _n And screening out tasks with less sensitive time delay and general calculation resource amount, and performing partial unloading based on Q learning on the screened tasks. Finally, after the offloading decision of the task requested by all the mobile vehicle user terminals is determined, the computing resources are allocated to the users in each MEC server. The strategy method of the invention fully utilizes local resources and server resources, and reduces the total overhead of the system.

Description

Vehicle task part unloading strategy method based on Q learning

Technical Field

The invention belongs to the field of vehicle networking, and particularly relates to a vehicle task part unloading strategy method based on Q learning in a vehicle-mounted self-organizing network.

Background

With the development of the intelligent automobile industry, intelligent transportation systems (Intelligent Transport System, ITS) are becoming research hotspots, and autonomous control and path planning of vehicles are increasingly widely reflected in future intelligent transportation. In the future, an autonomous vehicle will be equipped with a plurality of sensors that collect data related to the services of the mobile vehicle terminals in the surrounding environment of the vehicle, and many intelligent services in intelligent traffic not only need to rely on the data collected by these sensors, but also the services provided must guarantee a low delay. Implementing intelligent transportation requires the use of different sensors to collect different data, such as energy consumption, ambient characteristics, vehicle status characteristics, and driver fatigue level, which, if processed separately, would not only consume significant computing resources, but would also affect the timeliness and reliability of the intelligent services provided.

The mobile edge calculation is to offload some complicated tasks requested by the mobile vehicle terminal to the MEC server for calculation and storage, and compared with offloading the tasks to the cloud computing center, the method shortens the distance of task transmission and reduces delay. At present, the research of domestic and foreign scholars is a computing and unloading technology, and the computing and unloading process is affected by different factors, such as the performance of mobile vehicle equipment, the quality of a return link, the communication condition of a radio channel and the like, so that the key of computing and unloading is to find a proper unloading decision, the unloading decision mainly depends on the time delay and energy consumption required by the task requested by a mobile vehicle terminal to be locally computed or unloaded to an MEC server, as the concept of task recursion decomposition and parallel computing gradually come up, domestic and foreign researchers begin to pay attention to partial computing and unloading in mobile edge computing, the computing and unloading technology mainly utilizes the MEC server and the mobile vehicle terminal to carry out parallel computing so as to reduce the time delay and resource optimization, one task requested by the vehicle terminal can be divided into two parts, one part of the task is directly put into the local to be computed, and the other part of the task is unloaded to the MEC server to be computed, and compared with the traditional computing and unloading scheme, one key problem that the partial computing and unloading is that the task is only needs to be unloaded to the MEC server to enable the total overhead to be minimized.

ZL2019101438105 discloses a method for task unloading of minimizing vehicle energy consumption based on mobile edge calculation, which makes a selection of a maximum energy-saving selection algorithm and a short-term path prediction algorithm to meet a time delay constraint and minimize task unloading energy consumption at the same time, and the unloading decision algorithm is too complex to be beneficial to calculation of the unloading method.

The ZL2020101714540 discloses a task unloading method and a task unloading device based on a mobile edge computing scene, wherein the method collects task information of equipment terminals, uploads the task information to an edge server through a Small Cell base station, sets an optimization target equation with minimum system overhead, and carries out multiple decomposition and solution on the problem of the optimization target equation; although this approach significantly reduces the overall overhead of the system, it is still inefficient in terms of energy consumption by adjusting the unloading ratio during movement of the vehicle.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a partial unloading method based on Q Learning in a vehicle-mounted self-organizing network, which fully utilizes local resources and server resources, thereby reducing the total cost of a system.

In order to achieve the above purpose, the present invention is realized by the following technical scheme:

the invention relates to a partial unloading strategy method based on Q learning, which comprises the following steps:

s1: task classification is carried out on tasks requested by a mobile vehicle terminal, two extreme task types are eliminated, tasks with extremely sensitive time delay are directly locally unloaded, and tasks with large calculation resource amount are all unloaded to an MEC server for calculation;

s2: for the rest business of which the type is not easy to judge, defining the task classification factor as beta _n Screening out tasks with less sensitive time delay and general calculation resource quantity;

s3: partial unloading based on Q learning is carried out on the screened tasks, so that an optimal strategy is obtained;

s4: after the offloading decision of the task requested by all the mobile vehicle user terminals is determined, the computing resources are allocated to the users in each MEC server.

Further, the step S1 specifically includes: this step classifies the tasks requested by the mobile vehicle terminal. Considering two extreme easily-judged task types, one is a task with extremely sensitive time delay, and the most is a safe message task, and the task is directly unloaded locally; the other is a task requiring a large amount of computing resources, namely a task requiring the computing resource capacity of the mobile vehicle terminal itself with the computing resource amount of >2/3, which is generally a map-like message task, and the task is directly offloaded to the MEC server for computing.

Further, the step S2 specifically includes: the step is to simply judge the specific type of service according to the residual delay tolerance and the calculation resource quantity of the tasks, and adopt the definition task classification factor as beta _n And deleting a part of the time delay is not sensitive, and the calculation resource quantity is also common to carry out partial unloading on the task.

Here, a task classification factor β is defined _n Represented as

Wherein d is _n For message task data size, C _n The amount of resources required for a unit message task size,

to accomplish the maximum tolerable latency for the task, N represents a total of N tasks.

Further toThe step S3 specifically includes: this step is less sensitive to time delay, and the computing resource amount is also a partial offloading of general tasks based on Q learning, and the value Q of the action taken by the participant in each state may be denoted as Q (s, a), which reflects the feedback of the environment to the current state s when the action a is performed, thereby measuring the benefit of the current policy pi. The Q values are stored in the form of a Q table, and the value function Q (s, a) is approximated to the target function Q (s, a) by iteratively and updatably learning the Q table without any prior knowledge about the environment, thereby obtaining an optimal strategy pi ^* 。

The invention provides a partial unloading strategy method based on Q learning, which classifies tasks of a vehicle terminal, firstly eliminates two conditions of extremely sensitive time delay and extremely large computing resource amount, secondly classifies the rest tasks according to the requirements of the time delay and the computing resource, and performs partial unloading on tasks with less sensitive time delay and general computing resource amount so as to reduce the total cost of the system, and finally obtains the optimal unloading decision by using Q learning according to a multi-objective optimization model.

The beneficial effects of the invention are as follows: the invention provides a partial unloading strategy method based on Q learning in the Internet of vehicles so as to fully utilize local resources and server resources, thereby reducing the total overhead of the system.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

Embodiments of the invention are disclosed in the drawings, and for purposes of explanation, numerous practical details are set forth in the following description. However, it should be understood that these practical details are not to be taken as limiting the invention.

The invention discloses a vehicle task part unloading strategy method based on Q learning in a vehicle-mounted self-organizing network, which is applied to the vehicle-mounted self-organizing network and comprises the following steps: firstly, task classification is carried out on tasks requested by a mobile vehicle terminal, two extreme task types are eliminated, and time delay is extremely sensitiveDirectly unloading tasks locally, and completely unloading tasks with large required computing resource amount to an MEC server for computing; secondly, defining task classification factors as beta for the rest businesses which are not easy to judge types _n Screening out tasks with less sensitive time delay and general calculation resource quantity, and performing partial unloading based on Q learning on the screened tasks; finally, after the offloading decision of the task requested by all the mobile vehicle user terminals is determined, the computing resources are allocated to the users in each MEC server.

As shown in fig. 1, the method for unloading a vehicle task part based on Q learning in a vehicle-mounted ad hoc network according to the present invention specifically includes the following steps:

s1: for tasks requested by a mobile vehicle terminal, task classification is carried out, firstly, two extreme task types which are easy to judge are considered, one task is a task with extremely sensitive time delay, namely a safe message task, and the task is directly unloaded locally; the other is a task requiring a large amount of computing resources, namely a task requiring the computing resource capacity of the mobile vehicle terminal itself with the computing resource amount of >2/3, which is generally a map-like message task, and the task is directly offloaded to the MEC server for computing.

S2: for the rest business which is not easy to pass through the time delay tolerance and the calculation resource quantity of the task to simply judge the specific type, adopting the definition task classification factor as beta _n Some of the time delays are not sensitive, the task with common computing resource quantity is partially unloaded, the complexity of the partial unloading decision algorithm can be reduced by the classification mode, and the task classification factor beta is defined _n Can be expressed as

Definition τ ₁ ，τ ₂ For the threshold value, assume a _nm To offload decision variables, where a _nm E (0, 1), when beta _n ＞τ ₂ I.e. a _nm =0, which is offloaded to the local for calculation; when beta is _n ＜τ ₁ I.e. a _nm =1, which is offloaded to MEC server for computation, when τ ₁ ＜β _n ＜τ ₂ When it is partially unloaded.

in particular, the value Q of the action taken by the participant in each state may be denoted as Q (s, a), which reflects the feedback of the environment to the current state s when action a is performed, thereby measuring the degree of benefit of the current policy pi. The Q values are stored in the form of a Q table, and the value function Q (s, a) is approximated to the target function Q (s, a) by iteratively and updatably learning the Q table without any prior knowledge about the environment, thereby obtaining an optimal strategy pi ^* 。

Assuming that an offloading decision is made in one time slot of the network, the actions of the mobile vehicle user in this time slot are defined as

In the method, random action selection is considered to cause unbalanced load of the MEC server, so when actions are selected, the actions are selected in a greedy mode, the defined exploration rate is represented by epsilon, and then the action selection can be represented as

V is the vehicle user set, v= { V ₁ ,V ₂ ,...,V _n }。

The optimal selection strategy is:

where g (s, a) is the prize for performing action a in the current state s.

The Q function is adopted as an evaluation function, and the maximum total expected return function generated after learning is as follows:

wherein eta is taken as a learning parameter and satisfies 0.ltoreq.eta.ltoreq.1, so that an update formula of the Q value can be expressed as:

wherein the method comprises the steps of

Defined as a learning rate, which indicates how much user rewards are learned in this state, since the goal of the method is to minimize the total overhead of the system, the rewards for one slot are:

time required for all computing tasks to be performed, < >>

The energy required to complete all tasks, lambda ₁ λ ₂ As a weighted value, lambda ₁ +λ ₂ ＝1。

It can be seen that only the current state is needed to learn by continuous iteration

Up to the value of the corresponding user offloading decision quantity alpha at convergence _nm I.e. optimal.

S4: after the unloading decision of the tasks requested by all the mobile vehicle user terminals is determined, computing resource allocation is continuously carried out on the users in each MEC server, the resource allocation problem is a convex optimization problem, and the optimal solution of the task can be obtained through a Lagrange multiplier method according to KKT conditions.

The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims

1. A vehicle task part unloading strategy method based on Q learning is characterized in that: the method comprises the following steps:

s1: task classification is carried out on tasks requested by the mobile vehicle terminal, wherein the task classification firstly considers two extreme task types which are easy to judge, namely a task with extremely sensitive time delay, namely a safe message task, the task is directly unloaded locally, and the task with large calculation resource quantity is needed, namely the task with the calculation resource capacity of the mobile vehicle terminal per se with the needed calculation resource quantity of more than 2/3, and the task is directly unloaded to an MEC server for calculation;

s2: for the rest service definition task classification factor of the type which is not easy to judge is beta _n Screening out tasks with less sensitive time delay and general calculation resource quantity, and defining task classification factor beta _n Represented as

to achieve the maximum tolerable latency for the task, N represents a total of N tasks,

definition τ ₁ ，τ ₂ For the threshold value, assume a _nm To offload decision variables, where a _nm E (0, 1), when beta _n ＞τ ₂ I.e. a _nm =0, which is offloaded to the local for calculation; when beta is _n ＜τ ₁ I.e. a _nm =1, which is offloaded to MEC server for computation, when τ ₁ ＜β _n ＜τ ₂ When in use, the device is partially unloaded;

2. A vehicle mission section offload strategy method based on Q learning as claimed in claim 1, wherein: the step S3 specifically comprises the following steps: the step is to perform partial unloading based on Q learning for tasks with less sensitivity to time delay and general calculation resource quantity, the value Q value of actions taken by participants in each state can be expressed as Q (s, a), which reflects feedback of the environment to the current state s when the action a is executed, so that the benefit degree of the current strategy pi is measured, the Q values can be stored in the form of a Q table, the Q table is consulted to perform iterative update learning under the condition that no prior knowledge about the environment is needed, and the value function Q (s, a) approaches to the target function Q (s, a), so that the optimal strategy pi is obtained.

3. A vehicle mission section offload strategy method based on Q learning as claimed in claim 1, wherein: the method is applied to the vehicle-mounted self-organizing network.