CN114928826A

CN114928826A - Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation

Info

Publication number: CN114928826A
Application number: CN202210358813.2A
Authority: CN
Inventors: 李致远; 张增翔; 毕俊蕾; 吴越; 彭二帅
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-08-19

Abstract

The invention discloses a two-stage optimization method, a controller and a decision method for software defined vehicle-mounted task unloading and resource allocation, wherein the two-stage optimization method comprises the following steps: 1. obtaining the information that the vehicle task can access the RSU, the information of the vehicle-mounted task and the like; 2. converting a two-stage optimization method of vehicle-mounted task unloading and resource allocation into a mathematical problem; 3. solving the mathematical problem in the step 2 by using a deep reinforcement learning method; 4. and deploying the algorithm to a central controller defined by software, and determining the optimal unloading and resource allocation decision for the vehicle-mounted task by the SDN controller according to the current state information of the network and the nodes. The invention fully considers the unloading position and the proper resource scheduling of each vehicle-mounted task, and maximizes the benefit of an edge computing server provider while ensuring that computing resources are used sufficiently and computing time delay meets the task requirement.

Description

Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation

Technical Field

The invention belongs to the field of vehicle-mounted edge task processing, and relates to a software-defined vehicle-mounted task unloading and resource allocation optimization method based on deep reinforcement learning, a controller and a decision method, which can be used in a small base station environment. The method has the advantages that the reasonable unloading and resource allocation decision is carried out on the vehicle-mounted task, the execution time delay of the task can be effectively reduced, and the method is particularly suitable for solving the execution fault tolerance problem under the condition of unloading of the task with the sudden hot spot.

Background

With the rapid increase in wireless communication demand, transmission rates and network capacity of conventional networks face unprecedented challenges. In addition, for new service scenarios in 5G and 6G network environments, such as vehicle networks, augmented virtual reality, and industrial internet of things, they put higher demands on latency, energy efficiency, and other performance. With the above increasing challenges, edge computing is expected to provide available computing service capabilities for resource constrained devices. The method can effectively shorten the data transmission distance between the user equipment and the data center and avoid network congestion. It allows the vehicle to move its onboard tasks to the network edge for computation. Since many computing tasks may be completed near the data source, the computing load may be distributively balanced.

Currently mainstream balancing of computing loads involves only the allocation of computing resources from server to server, and does not allow for the offloading of a single server to each onboard task and the allocation of computing resources. This ignores the impact of a single on-board task on the overall load balancing.

Wireless communication and computing resources are often very limited and energy intensive, which makes it difficult to meet the increasing demands and dynamic demands of internet of things applications and to address the heterogeneous demands of smart objects communicating over the internet. Therefore, flexible resource management, intelligent network control, and efficient task scheduling algorithms play an important role in ensuring fair and guaranteed performance. Software Defined Networking (SDN) is used to implement flexible collaborative task offload service orchestration in cloud Mobile Edge Computing (MEC). A service arrangement scheme is provided to reduce network load, task delay and energy consumption.

Task scheduling problems in dynamic internet of things environments are often one of the most challenging resource management problems, as it often represents a difficult online decision, and proper solutions often depend on dynamic workloads and interactions with the surrounding environment. For Deep Reinforcement Learning (DRL), which is very suitable for solving the problem under dynamic environment interaction, we propose an offloading decision model based on Deep reinforcement learning, which knows resource requirements, access networks and user mobility. Importantly, it takes into account future data dependencies of the following tasks when deciding on the current task from learned off-load knowledge. By means of the model, the optimal strategy can be obtained directly from the environment without complex calculation of the unloading solution.

The method mainly researches and obtains the global state sensing data of the network through an SDN controller, for example, the number of all vehicle-mounted tasks in the area and the load state of an MEC server in the area are obtained, then action decision is carried out on the current state data according to a depth determination policy gradient optimization algorithm deployed on the SDN controller, the unloading and resource allocation suggestions of all the vehicle-mounted tasks are directly given, command data are forwarded to all Roadside Access units (RSUs) through the SDN controller, specific decisions are successfully executed, and the deep reinforcement learning model can continuously update and learn through the results of the decisions and the environmental state data, so that the final given decisions are the optimal decisions.

In view of the above situations, it is desirable to provide a software-defined vehicle-mounted task unloading and resource allocation two-stage optimization method based on deep reinforcement learning, which can cope with the unloading situation of vehicle-mounted tasks and can consider various influence factors.

Disclosure of Invention

Aiming at the problems, the invention provides a two-stage optimization method for unloading and resource allocation of software-defined vehicle-mounted tasks, which is used for solving the problem of unbalanced server load caused by vehicle-mounted task calculation and comprises the following steps:

step 1, acquiring a set of RSUs accessed by a vehicle, related information of tasks of the vehicle requested to be unloaded in an RSU area, a load of an MEC server and a load of a local server corresponding to the RSU;

step 2, converting the two-stage optimization method of vehicle-mounted task unloading and resource allocation into a mathematical problem;

step 3, establishing a Markov model and solving the mathematical problem in the step 2;

and 4, deploying the algorithm to the SDN controller.

Further, the information in step 1 includes:

the computation time delay constraint of the task is defined as T ═ T ₁ ,…,T _j ,…,T _n }; wherein ^T _i A delay constraint representing the ith task;

② defining local server set possibly sent by vehicle-mounted task as SER ═ { SER ₁ ,…,SER _i ,…SER _n }；

③ the CPU computing power (number of cycles per second) of the server is defined as H ═ H { (H) ₁ ,…,H _n In which H is _i Representing SER _i The CPU computing power of (1);

fourthly, the current vehicle-mounted task set needing to be processed is Q ═ Q ₁ ,…,Q _j ,…,Q _n }；

Fifthly, calculating the number of CPU cycles needed by each vehicle-mounted task as D ═ D ₁ ,D _j ,...,D _n }；

Sixthly, the computing power of the MEC server is f _m ；

RSU total currently available transmission power e _r ；

Further, the method for converting the vehicle-mounted task unloading and resource allocation two-stage optimization method into the mathematical problem in the step 2 comprises the following steps:

step 2.1, the size of the computational task to be executed can be written as: b _n And represents the input quantity, and the unit is kbits. B _n Represents the calculation of R _n The size of the computational input data required (for intensive tasks) includes program code and input parameters.

And 2.2, interference exists between different RSU areas because the RSUs in different areas and the local server are of the same type and have the same frequency spectrum. The signal-to-noise ratio of RSU device k at time slot t is:

wherein e _m,k (t) for the RSU device k offloaded transmission power (allocation) to the MEC, assuming that the currently allocated transmission power ratio is κ _e Then, there are:

e _m,k (t)＝κ _e ·e _r (2) n is the Gaussian noise power inside the channel and is realized by randomly generating a Gaussian noise function.

According to shannon's theorem, the maximum data rate achievable between the RSU device k and the macro base station m is:

R _n ＝W*log ₂ (1+N _k (t)) (3)

wherein R is _n Is the available link speed, W is the bandwidth of the link, assuming the total bandwidth provided by the macro base station at the MEC layer is B _m τ k is the proportion of bandwidth allocated to RSU device k, then the expression for W is:

W＝B _m ·τk (4)

wherein N is _k (t) is the signal-to-noise ratio, usually expressed in decibels (dB), and 10 × lgN for decibels _k (t)。

Step 2.3, defining the time required by the vehicle-mounted task to execute locally, wherein after the SDN makes a decision that the local server executes the task, the time required by the task execution is as follows:

wherein D _n Representing the number of CPU cycles required to compute a task,

representing the computing resources (number of available CPU cycles per second) allocated to the task by the local server, assuming the computing power of the local server is f _l By decision distributionHas a computing resource ratio of k _l . Then there are:

step 2.4, according to step 2.2, if the SDN makes a decision to unload the vehicle-mounted task to the MEC server for execution, the RSU needs to transmit the current task to the MEC server, so the transmission time of the task to the MEC is the transmission time

Wherein r is _n Representing the uplink rate (time-varying and allocatable) in the radio channel, calculated from the allocated bandwidth, B _n Is the task data size.

Step 2.5, according to step 2.4, after the task is transmitted to the MEC server, the MEC server calculates the current task, and the time required for the task to be executed on the MEC server is:

D _n number of CPU cycles required for a task, f _n Computing resources (time-varying allocable) allocated to the MEC for the current task, assuming the computing power of the MEC server is f _m The proportion of computing resources allocated to the task by the decision is k _m Then, there are:

f _n ＝f _m ·k _m (8)

step 2.6, according to step 2.5, when the MEC server executes the task, the result needs to be returned to the RSU, and the return time returned by the task result is:

wherein B is _b Is the data size of the processing result, r _b Is the download rate.

Step 2.7, according to steps 2.3, 2.4, 2.5 and 2.6, it can be known that there are two processing modes for the vehicle-mounted task, and the vehicle-mounted task is placed on the local server to be executed and unloaded to the MEC server to be executed, so that there are two cases for the total time of the task, the first case is:

wherein, the detailed flow execution diagram of the first case is shown in detail in fig. 4, which is a detailed flow diagram of the local execution of the vehicle-mounted task

The second case is:

wherein

For the transfer time of a task onto the MEC,

is the execution time of the task on the MEC server.

The detailed flow execution details of the second case are shown in fig. 5 as a detailed flow chart when the in-vehicle task is unloaded to the MEC server for execution.

The computation delay of a single task is defined as

Representing the computation delay of task j, j being the nth task, there are:

wherein,

representing the computation delay of a task x preceding a task j, n-1 tasks preceding the task xAnd (5) performing a task.

Step 2.8, according to step 2.7, the sum of the calculated delays for all tasks can be found to be:

step 2.9, combining step 2.5, step 2.6 and step 2.7, converts the vehicle-mounted task unloading and resource allocation two-stage optimization method into solving the following formula:

further, a markov model is established and the mathematical problem in equation (14) is solved. The key to solving equation (14) is to find the optimal placement sequence of the vehicle-mounted tasks and the most reasonable computing resources allocated by each task. The method comprises the following specific steps:

step 3.1, establishing a Markov state space:

S＝{t,h,a _k (t),H _m,k (t),f _m ,e _r ,B _m }

wherein the various parameters are specified below:

①t＝{T ₁ ,…,T _j ,…,T _n is a vehicle-mounted task R _n A set of computational delay constraints;

②h＝{H ₁ ,…,H _i ,…,H _n is the available CPU cycles (computing power) of the local server ser;

③a _k (t)＝{a ₁ (t),a ₂ (t),...,a _k (t),...,a _κ (t) calculating the size of a service device (RSU) arrival task in a task queue;

④H _mm,k (t)＝{H _m,1 (t),H _mm,2 (t),...,H _m,k (t),...,H _m,κ (t) } channel vectors of traffic devices (RSUs) k for uplink transmission;

⑤f _m is the current available CPU cycles (computing power) of the MEC server;

⑥e _r total transmission power currently available for RSUs

⑦B _m The total bandwidth provided by the macro base station at the MEC layer.

Step 3.2, establishing a Markov motion space:

A＝{k _l ,k _m ,κ _e ,τk}

wherein the various parameters are specified below:

①k _l and calculating the proportion of the resources distributed to the vehicle-mounted task q in the local server. If k is _l 0 means that no computing resource is allocated, i.e. the vehicle-mounted task is not computed or the vehicle-mounted task is unloaded to the MEC server for computation;

②k _m and calculating the proportion of the resources distributed to the vehicle-mounted task q after the vehicle-mounted task q is unloaded to the MEC server. If k is _m 0, the vehicle-mounted task q is executed or not calculated in the local server;

③κ _e a proportion of transmit power allocated for the RSU to offload tasks to the MEC server;

tau k is the bandwidth proportion allocated to the service equipment and is used for calculating unloading;

step 3.3, establishing a Markov reward function:

reward＝ε(η)×base+κ×[t-(T+D ^n-1 )] (15)

wherein the various parameters are specified below:

phi epsilon (eta) is a step function

When epsilon (eta) is 1, the vehicle-mounted task is successfully calculated, and when epsilon (eta) is 0, the vehicle-mounted task is not successfully calculated;

base is a constant, representing the base prize. Epsilon (eta) x base represents that when one vehicle-mounted task is successfully calculated, the basic reward can be obtained, and when the vehicle-mounted task is failed, the basic reward can not be obtained;

③T+D ^n-1 representing the computation delay caused by computing an onboard task;

④κ×[t-(T+D ^n-1 )]where κ is the weight and T is the maximum computation delay allowed for the on-board task, then κ × [ T- (T + D) ^n-1 )]The more time saved for calculating the vehicle-mounted task is represented, the more rewards are acquired; conversely, if the task exceeds the specified maximum duration, the task is penalized, and the more time that is exceeded, the more penalty is obtained;

step 3.4, according to the Markov model in the step 3.3, using a depth determination strategy gradient optimization algorithm to solve the optimal unloading and resource scheduling decision:

step 3.4.1, establishing an Actor current network, an Actor target network, a Critic current network and a Critic target network, wherein the description of the four networks is as follows:

the parameter of the current network of the Actor is theta, and the theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to the current state S. The action A acts on the current state S to generate a state S' and an award R, and the award R is obtained by an award function reward;

the Actor target network has parameters theta ', theta' also refers to a neural network and is responsible for selecting an action A 'from the experience playback pool and updating theta';

and thirdly, the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action.

And the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q'.

And step 3.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network. The specific steps are as follows:

3.4.2.1, firstly obtaining an initialization state S, and generating an action A by the current network of the Actor according to the state S;

3.4.2.2, calculating a reward R based on state S and action A, and obtaining a next state S';

3.4.2.3, storing { S, A, S' } in an experience playback pool;

3.4.2.4, recording the current state as S';

3.4.2.5, calculating the current Q value and the target Q value;

3.4.2.6, updating the Critic current network parameter omega;

3.4.2.7, updating the current network parameters of the Actor;

3.4.2.8, if the current state S' is the termination state, the iteration is complete, otherwise go to step 3.4.2.2.

For the interpretation of the Q value: the Q value refers to the action value function Q _π Value of (S, A), action value function Q _π (S, A) represents the value of following the strategy pi, and the state S takes the action A, namely at the time t, starting from the state S, after the action A is executed, the Agent takes the strategy pi to obtain the return expectation. The calculation formula is as follows:

Q _π (S,A)＝E _π (G _t |S _t ＝s,A _t ＝a) (17)

here, the function Q _π Called the action value function of the strategy pi. For state-action pairs, each action value function is determined by the value each state has to take the action. Wherein G is _t Refers to a reward. Markov decision process returns G _t Is defined as: from the current state S _t To a termination state S _T Sum of the acquired reward values. The expression thereof is shown below. Where the subscript of R starts at t + 1:

G _t ＝R _t+1 +γR _t+2 +γ ² R _t+3 +...,γ∈[0,1] (18)

wherein R is _t Referring to the prize at time t, γ is the discount coefficient, and in practical cases there is always a large uncertainty for future prizes. To avoid situations where the sequence is too long or the reward tends to be infinite in consecutive tasks, for discounting future rewards.

The interpretation for strategy π is: under the random strategy, an Agent (Agent) can execute a plurality of actions in a certain state, the action probability distribution is (0.2,0.2,0.3,0.3), and the random strategy maps the state into the probability of executing the action. The random policy function may be expressed as:

π(a|s)＝P(a|s)＝P(A _t ＝a|S _t ＝s) (19)

equation (19) represents the probability of executing action a according to policy π when Agent is in state s.

The Markov Decision Process (MDP) applies a strategy to generate the sequence as follows:

generating an initial state S from the initial state distribution _i ＝S ₀ 。

② according to strategy pi (a | s), giving action A _i And performs the action.

Obtaining reward R according to reward function and state transfer function _i+1 And the next state S _i+1 。

④S _i ＝S _i+1

Continuously repeating the process from the second step to the fourth step to generate a sequence:

{S ₀ ,A ₀ ,R ₁ ,S ₁ ,A ₁ ,R ₂ ,S ₂ ,A ₂ ,R ₃ ,S ₃ ,...}

if the task is articulated, the sequence will end up in state S _goal (ii) a If the task is continuous, the sequence will continue indefinitely.

After all, the strategy is updated according to the action value function. When the value of the action value function is larger, the strategy is better, and the model algorithm can continuously learn to ensure that the final given decision is the best decision.

Based on the optimization method, the invention further provides an SDN controller, where the SDN controller is a controller deployed with the optimization method, and a specific method for deploying an algorithm to the SDN controller specifically includes: and after the training of the depth determination strategy gradient optimization algorithm is completed, saving the current network of the Actor and deploying the current network to the SDN controller.

Based on the SDN controller, when the vehicle-mounted task is unloaded and the resource scheduling is required, the SDN controller determines the optimal unloading and resource allocation decision for the vehicle-mounted task according to the current state information of the network and the nodes, and the specific decision method comprises the following steps:

the specific process of forwarding the network and node state information to the SDN controller and forwarding the processing decision to the RSU by the SDN controller is as follows:

step 4.1, the sender connects to the network and transmits to the SDN switch a corresponding data packet comprising: acquiring a set of RSUs accessed by a vehicle, related information of tasks of vehicles requested to be unloaded in an RSU area, loads of MEC servers, loads of local servers corresponding to the RSUs and the like;

4.2, after receiving the data packet and analyzing the packet header, the SDN switch inquires whether the data packet has a corresponding flow rule in a flow table of the SDN switch, and if the data packet is successfully matched, the SDN switch directly forwards the data packet to a corresponding port; if the matching fails, the next step is carried out;

step 4.3, the SDN switch generates a Packet-in event according to the corresponding data Packet, and transmits the Packet-in data Packet to the controller by using a TCP (transmission control protocol) or a TLS (transport layer security) protocol;

4.4, after receiving the Packet-in data Packet, the SDN controller generates a corresponding forwarding strategy according to the related application and sends the forwarding strategy to a corresponding SDN switch; the strategy comprises a depth determination strategy gradient optimization algorithm decision deployed in the SDN controller and an RSU (remote subscriber unit) required to be forwarded by the decision;

and 4.5, adding the flow rule into the flow table by the SDN switch, and forwarding the corresponding data packet to a specified interface, thereby completing data forwarding.

The beneficial effects of the invention are as follows:

1. the invention fully considers the unloading position and the proper resource scheduling of each vehicle-mounted task, and maximizes the benefit of an edge computing server provider while ensuring that computing resources are sufficiently used and computing time delay meets task requirements.

2. The invention designs a set of complete vehicle-mounted task calculation flow algorithm (depth determination strategy gradient optimization algorithm), the details of the task flow are shown in a vehicle-mounted task unloading and resource allocation decision flow chart of figure 1, and the details of the algorithm flow are shown in a depth determination strategy gradient optimization algorithm flow chart of figure 2, and all flows from unloading to calculation to result returning of the vehicle-mounted task are included, so that the vehicle-mounted task calculation flow algorithm has completeness and practicability.

3. Based on the algorithm design of the invention, different requirements can be adaptively met only by further adjusting the reward function (for example, if the aim of saving the computing resource of the MEC server is to be achieved, one reward definition can be carried out on the MEC residual resource in the reward function so as to achieve the aim of saving the computing resource by the algorithm self-adaptive society), and the reward function can be adjusted based on the side points of different users so as to achieve the aim of achieving different benefits.

4. The invention uses the SDN controller to collect and forward the global information and make policy decision, the SDN decouples the forwarding surface and the control surface of the network equipment, the controller is responsible for the management of the network equipment, the arrangement of the network service and the scheduling of the service flow, and the invention has the advantages of low cost, centralized management, flexible scheduling and the like.

Drawings

FIG. 1 is a flow chart of an on-board task offloading and resource allocation decision

FIG. 2 flow chart of a depth determination strategy gradient optimization algorithm

FIG. 3 is a flowchart of a specific processing procedure of the vehicle-mounted task

FIG. 4 is a detailed flow chart of the local execution of the vehicle task

FIG. 5 is a flowchart illustrating the execution of the vehicle-mounted task off-loaded to the MEC server

Detailed Description

The invention will be further explained with reference to the drawings.

The present invention is further described below with reference to fig. 1, and it should be noted that the specific implementation of this example is based on the present technology, and detailed implementation procedures and implementation steps are given, but the scope of the present invention is not limited to this implementation example.

Suppose that vehicle j will be on-board task Q at this time _j Sent to the RSU, then according to a specific embodiment of the present invention:

(1) in step 1, a SDN controller is used to collect relevant information. Vehicle-mounted task R _n The calculation delay constraint set is t, a vehicle-mounted task set q to be processed, a CPU period set m occupied by each vehicle-mounted task, and service equipment for uplink transmissionChannel vector H of (RSU) k _m,k Total bandwidth available of MEC server B _m

(2) From the data obtained in (1), according to step 2, using the formula:

computation task Q _j Is calculated by

(3) SDN summarizes information of other vehicles with the local server and MEC server, according to step 2, using the formula:

and calculating the calculation delay of all the vehicle-mounted tasks in the server.

(4) The SDN summarizes the load information of the local server and the MEC server, and uses a formula according to the step 2:

the two-stage optimization method for vehicle-mounted task unloading and resource allocation is converted into a mathematical problem.

(5) According to step 3, the mathematical problem is solved using a depth determination strategy gradient optimization algorithm. In step 3.4.2:

1. the initialization state S, i.e. the state of the respective RSU, the completion of all vehicle tasks, is first obtained. And the Actor current network generates an action A according to the state S, wherein the action A is the unloading and resource allocation decision selected by the task of a certain vehicle. The specific method comprises the following steps: calculating a feature vector phi (S) of the state S, the action

Wherein pi _θ The strategy (namely the action) generated by the neural network theta is represented, and the neural network theta (Actor current network) can be served according to the current RSU local server and MECThe state of the device and other information select the decision of vehicle task unloading and resource allocation;

2. the reward R is calculated from the current state S and the action a, and a new state S' is generated. After a certain decision of vehicle task unloading and resource allocation is selected, the states of each RSU local server and MEC server and the completion conditions of all vehicle tasks are changed, and the new state is defined as S';

3. and storing { S, A, S' } into an experience replay pool, wherein the aim is to train the neural network better. Selecting an action A ' by the Actor target network theta ' according to S ' in the experience pool;

4. recording the current state as S';

5. calculating the current Q value and the target Q value

Q (S, A, omega) is a current Q value, the state S and the action A are input into a Critic current network omega to be calculated, and the calculation process is not expressed because the neural network is extremely complex; y is the target Q value, where Q '(S', A ', ω') is calculated in the same manner as Q (S, A, ω).

6. Updating the criticic current network ω using the current Q value and the target Q value:

ω←ω+(y-Q(S,A,ω))

y represents a more accurate Q value, and ω + (y-Q (S, A, ω)) means that Critic' S current net ω updates itself by the Q value. Because the updating of the neural network is extremely complex, only the basic idea of updating is reflected here;

the criticic current network omega helps the Actor current network theta to update:

θ←θ-TD(S,A,ω)

and TD (S, A, omega) represents that omega calculates the error of the action A selected in the state S from the optimal action, and theta-TD (S, A, omega) represents that the action is eliminated by the current network theta of the Actor. Because the updating of the neural network is extremely complex, only the basic idea of updating is presented here;

8. and if the current state S' is a termination state, finishing iteration, and making a decision of an optimal unloading time slot by the current network of the Actor, otherwise, turning to the step 2.

Further, in the above steps (1) - (5), a specific data forwarding flow of the SDN controller is specifically described:

when a sender wants to communicate with a receiver, the basic communication flow mainly includes the following steps.

Step 1, a sender (each roadside access unit RSU) connects to a network and transmits a corresponding data packet (a set of RSUs to which a vehicle is to be accessed, information about a task of a vehicle requesting offloading in an RSU area, a load of a MEC server, a load of a local server corresponding to the RSU, etc.) to an SDN switch.

And 2, after receiving the data packet and analyzing the packet header, the SDN switch inquires whether a corresponding flow rule exists in the data packet in a flow table of the SDN switch. If the matching is successful, directly forwarding to a corresponding port; if the match fails, the next step is taken.

And 3, the SDN switch generates a Packet-in event according to the corresponding data Packet, and transmits the Packet-in data Packet to the controller by a TCP (transmission control protocol) or TLS (transport layer security) protocol.

And 4, after receiving the Packet-in data Packet, the SDN controller generates a corresponding forwarding strategy (the strategy comprises a depth determination strategy gradient optimization algorithm decision deployed in the SDN controller and an RSU (restricted subscriber Unit) to be forwarded by the decision) according to related applications, and sends the decision to a corresponding SDN switch.

And 5, adding the flow rule into a flow table by the SDN switch, and forwarding the corresponding data packet to a specified interface, thereby completing data forwarding.

Further, details of the flow of the task-specific processing procedure of the present invention are shown in the flowchart of the in-vehicle task-specific processing procedure of fig. 3.

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A two-stage optimization method for software-defined vehicle-mounted task unloading and resource allocation is characterized by comprising the following steps:

step 1, acquiring a set r of RSUs accessed by a vehicle, information of unloaded vehicle tasks requested in an RSU area, a load of an MEC server and a load of a local server corresponding to the RSU;

step 2, modeling the vehicle-mounted task unloading and resource allocation two-stage optimization method;

and 3, establishing a Markov model and solving the model in the step 2.

2. The two-stage optimization method for software-defined vehicle-mounted task offloading and resource allocation according to claim 1, wherein the information obtained in step 1 includes:

step 1.1, obtaining the calculation time delay constraint of the task, and defining the calculation time delay constraint as:

t＝{T ₁ ，…，T _j ，…，T _n }

step 1.2, acquiring a local server set which the vehicle-mounted task may be sent into, and defining the local server set as:

ser＝{SER ₁ ，…，SER _i ，…SER _n }

step 1.3, obtaining the CPU computing power (available cycles per second) of the server, and defining it as:

h＝{H ₁ ，…，H _n }

wherein H _i Representing SER _i The CPU computing power of (1);

step 1.4, acquiring a current vehicle-mounted task set needing to be processed, and defining the current vehicle-mounted task set as follows:

q＝{Q ₁ ，…，Q _j ，…，Q _n }

step 1.5, acquiring the number of CPU cycles required for calculating each vehicle-mounted task, and defining the number as:

D＝{D ₁ ，D _j ，...，D _n }

step 1.6, acquiring the computing power of the MEC server,define it as f _m

Step 1.7, obtaining the total transmission power currently available for the RSU, and defining the total transmission power as e _r 。

3. The two-stage optimization method for software-defined vehicle-mounted task offloading and resource allocation according to claim 1, wherein the specific process of step 2 comprises:

step 2.1, writing the size of the calculation task to be executed into: b _n Denotes the input amount kbits, B _n Represents the calculation of R _n The size of the computational input data required (for intensive tasks), including program code and input parameters;

step 2.2, defining the signal-to-noise ratio of the RSU device k at the time slot t as:

wherein e _m，k (t) for the RSU device k offloaded transmission power (allocation) to the MEC, assuming that the currently allocated transmission power ratio is κ _e Then, there are:

e _m，k (t)＝κ _e ·e _r (2)

according to shannon's theorem, the maximum data rate that can be achieved between the RSU device k and the macro base station m is:

R _n ＝W*log ₂ (1+N _k (t)) (3)

wherein R is _n Is the available link speed, W is the bandwidth of the link, assuming the total bandwidth provided by the macro base station at the MEC layer is B _m ，τ _k If it is the bandwidth proportion allocated to RSU device k, the expression of W is:

W＝B _m ·τk (4)

wherein N is _k (t) is the signal-to-noise ratio, usually expressed in decibels (dB), and 10 × lgN for decibels _k (t)；

Step 2.3, defining the time required by the vehicle-mounted task to be executed locally, and after the SDN makes a decision of executing the task by the local server, the time required by the task to be executed is as follows:

wherein D _n Representing the number of CPU cycles required to compute a task,

representing the computing resources (number of available CPU cycles per second) allocated to the task by the local server, assuming the computing power of the local server is f _l The proportion of computing resources allocated by the decision is k _l Then, there are:

step 2.4, if the SDN makes a decision to unload the vehicle-mounted task to the MEC server for execution, the RSU needs to transmit the current task to the MEC server, so that the transmission time of the task to the MEC is

Wherein r is _n Representing the uplink rate (time-varying and allocatable) in the radio channel, calculated from the allocated bandwidth, B _n Is the task data size;

f _n ＝f _m ·k _m (8)

step 2.6, according to step 2.5, after the MEC server executes the task, the result needs to be returned to the RSU, and then the return time returned by the task result is:

wherein B is _b Is the data size of the processing result, r _b Is the download rate;

and 2.7, obtaining that the vehicle-mounted task has two processing modes according to the steps 2.3, 2.4, 2.5 and 2.6: the task is executed by the local server and is unloaded to the MEC server for execution, so that the total time of the task can be divided into two cases, the first case is that:

the second case is:

the computation delay of a single task is defined as

Representing the computation delay of task j, j being the nth task, there are:

wherein,

representing the calculation delay of a task x before a task j, wherein the task x is n-1 tasks in front;

step 2.9, combining step 2.5, step 2.6 and step 2.7, converting the vehicle-mounted task edge scheduling and resource allocation decision method into solving the following expression:

4. the two-stage optimization method for software-defined vehicle-mounted task offloading and resource allocation according to claim 1, wherein the specific process of establishing the markov model in step 3 includes:

step 3.1, establishing a Markov state space:

S＝{t，h，a _k (t)，H _m，k (t)，f _m ，e _r ，R _m }

wherein the various parameters are specified below:

①t＝{T ₁ ，…，T _j ，…，T _n is a vehicle-mounted task R _n A set of computational delay constraints;

②h＝{H ₁ ，…，H _i ，…，H _n is the available CPU cycles (computing power) of the local server ser;

③a _k (t)＝{a ₁ (t)，a ₂ (t)，...，a _k (t)，...，a _κ (t) calculating the size of a service device (RSU) arrival task in a task queue;

④H _m，k (t)＝{H _m，1 (t)，H _m，2 (t)，...，H _m，k (t)，...，H _m，κ (t) is a channel vector of a service equipment (RSU) for uplink transmission;

⑤f _m current available CPU cycles (computing power) for the MEC server;

⑥e _r total transmit power currently available for the RSU;

⑦B _m total bandwidth provided at the MEC level for macro base stations;

step 3.2, establishing a Markov motion space:

A＝{k _l ，k _m ，κ _e ，τk}

wherein the various parameters are specified below:

②k _m calculating the proportion of resources allocated to the vehicle-mounted task q after being unloaded to the MEC server, if k _m 0, the vehicle-mounted task q is executed or not calculated in the local server;

step 3.3, establishing a Markov reward function:

reward＝ε(η)×base+κ×[t-(T+D ^n-1 )] (15)

wherein the various parameters are specified below:

phi epsilon (eta) is a step function

base is constant and represents the basic reward, and epsilon (eta) multiplied base represents that when a vehicle-mounted task is successfully calculated

Basic rewards can be obtained, and basic rewards can not be obtained if the basic rewards are failed;

④κ×[t-(T+D ^n-1 )]where κ is the weight and T is the maximum computation delay allowed for the on-board task, then κ × [ T- (T + D) ^n-1 )]The more time saved for calculating the vehicle-mounted task is represented, the more rewards are obtained; conversely, if the task is computed beyond a specified maximum duration, it is penalized, the more time that is exceeded, the more penalty is gained.

5. The two-stage optimization method for software-defined vehicle-mounted task offloading and resource allocation according to claim 4, characterized in that, according to a designed Markov model, a depth-determination policy gradient optimization algorithm is used to solve the optimal offloading and resource scheduling decisions, and the specific process includes:

step 3.4.1, establishing an Actor current network, an Actor target network, a Critic current network and a Critic network

The target network, these four networks are illustrated as follows:

firstly, a parameter of a current network of the Actor is theta, the theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to a current state S, the action A acts on the current state S to generate a state S' and an award R, and the award R is obtained by an award function reward;

the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action;

the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q';

and step 3.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network.

6. The two-stage optimization method for software-defined vehicle-mounted task offloading and resource allocation according to claim 5, wherein the specific process of step 3.4.2 includes:

step 3.4.2.1, first obtaining an initialization state S, and the Actor current network generating action A according to the state S;

step 3.4.2.2, calculating reward R according to state S and action A, and obtaining next state S';

step 3.4.2.3, storing { S, A, S' } in an experience playback pool;

step 3.4.2.4, recording the current state as S';

step 3.4.2.5, calculating the current Q value and the target Q value;

step 3.4.2.6, updating the Critic current network parameter omega;

step 3.4.2.7, updating the current network parameters of the Actor;

in step 3.4.2.8, if the current state S' is the termination state, the iteration is completed, otherwise go to step 3.4.2.2.

7. The two-stage optimization method for software-defined on-board task offloading and resource allocation according to claim 6, wherein for the interpretation of the Q-value: the Q value refers to the action value function Q _π Value of (S, A), action value function Q _π (S, A) represents the value of following the strategy pi, the state S takes the action A, namely at the time t, starting from the state S, after the action A is executed, the Agent takes the strategy pi to obtain the reward expectation, and the calculation formula is as follows:

Q _π (S，A)＝E _π (G _t |S _t ＝s，A _t ＝a) (17)

will function Q _π An action value function called policy π, each action value function being determined by the value of each state action taken for a state-action pair, where G _t Referred to as reward, the Markov decision process returns G _t Is defined as: from the current state S _t To a termination state S _T The sum of the prize values obtained, whose expression is shown below, R has a subscript starting at t + 1:

G _t ＝R _t+1 +γR _t+2 +γ ² R _t+3 +...，γ∈[0，1] (18)

wherein R is _t The reward at the moment t is referred to, gamma is a discount coefficient, in the practical situation, great uncertainty always exists for the reward in the future, and the reward is used for giving discount to the reward in the future in order to avoid the situation that the sequence is too long or the reward tends to be infinite in continuous tasks;

for the interpretation of the strategy π: under a random policy, an Agent may execute multiple actions in a certain state, the action probability distribution is (0.2,0.2,0.3,0.3), the random policy maps the state to the probability of executing the action, and the random policy function may be expressed as:

π(a|s)＝P(a|s)＝P(A _t ＝a|S _t ＝s) (19)

equation (19) represents the probability of executing action a according to policy π when Agent is in state s;

generating an initial state S from the initial state distribution _i ＝S ₀ ；

② according to strategy pi (a | s), giving action A _i And performing the action;

obtaining reward R according to reward function and state transfer function _i+1 And the next state S _i+1 ；

④S _i ＝S _i+1

{S ₀ A ₀ ，R ₁ ，S ₁ ，A ₁ ，R ₂ ，S ₂ ，A ₂ ，R ₃ ，S ₃ ，...}

if the task is articulated, the sequence will end up in state S _goal (ii) a If the task is continuous, the sequence will continue indefinitely；

The strategy is updated according to the action value function, when the value of the action value function is larger, the strategy is better, and the model algorithm can continuously learn to ensure that the final given decision is the best decision.

8. A software-defined vehicle-mounted task offloading and resource allocation controller, characterized in that the controller deploys the vehicle-mounted task offloading and resource allocation two-stage optimization method of any one of claims 1-7.

9. The method as claimed in claim 8, wherein when there is a need for offloading and resource scheduling of the on-board tasks, the SDN controller determines an optimal offloading and resource allocation decision for the on-board tasks according to the current network and node status information.

10. The method for determining the controller for software-defined vehicle-mounted task offloading and resource allocation according to claim 9, wherein the SDN controller determines an optimal offloading and resource allocation decision for the vehicle-mounted task according to current network and node state information by using the following specific method:

the specific processes of forwarding the network and node state information to the SDN controller and forwarding the processing decision to the RSU by the SDN controller are as follows:

4.2, after receiving the data packet and analyzing the packet header, the SDN switch inquires whether the data packet has a corresponding flow rule in a flow table of the SDN switch, and if the data packet is successfully matched with the flow rule, the SDN switch directly forwards the data packet to a corresponding port; if the matching fails, the next step is carried out;

step 4.3, the SDN switch generates a Packet-in event according to the corresponding data Packet, and transmits the Packet-in data Packet to the controller by a TCP protocol or a TLS protocol (a secure transport layer protocol);