CN116257361A

CN116257361A - Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method

Info

Publication number: CN116257361A
Application number: CN202310249124.2A
Authority: CN
Inventors: 潘春雨; 方禹; 李学华
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-06-13
Anticipated expiration: 2043-03-15
Also published as: CN116257361B

Abstract

The invention discloses an unmanned aerial vehicle-assisted method for optimizing the scheduling of computing resources of a mobile edge prone to failure, which comprises the following steps: initializing parameters and experience playback pools of the neural network; acquiring position parameters of each unmanned aerial vehicle, calling a server state real-time updating system, and acquiring first data information; inputting the position parameters and the first data information of each unmanned aerial vehicle into the neural network, and outputting a task execution position; executing a calculation task of the server state real-time updating system based on the task execution position, and updating the server state real-time updating system; packaging the position parameters, the first data information and the task execution position of each unmanned aerial vehicle into one piece of information, and storing the information into an experience playback pool; threshold judgment is carried out on the data quantity stored in the experience playback pool, if the data quantity accords with the threshold range, the parameters of the optimized neural network are not in accord with the threshold range, and the step two is returned; judging whether the neural network after optimizing the neural network parameters converges, if so, completing resource scheduling optimization, and if not, returning to the step two.

Description

Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to an unmanned aerial vehicle-assisted method for optimizing the scheduling of mobile edge computing resources which are prone to faults.

Background

With the development of mobile communication and internet, the industry including industry is transformed into digital, informationized and intelligent industry, which is an important industry upgrading direction. Unlike the consumer internet, the industrial internet is concerned with the transmission, processing and protection of real-time data, not just the transfer of information. This means that networks and devices of the industrial internet must have higher reliability and security to ensure fast processing of computing tasks while guaranteeing data privacy. The computing tasks generated in the industrial Internet system are mainly dependent tasks which are distributed at the same time, the data size, the type and the demand difference degree of the tasks are high, and the execution equipment is required to have the functions required by executing the tasks. However, the industrial internet terminal equipment has the requirements of insufficient computing resources and long endurance, and it is difficult to execute computing tasks only through the resources of the equipment. The mobile edge computing technology solves the problem of insufficient computing resources of the terminal, enables the terminal to execute tasks by means of computing resources of the network edge side, and keeps data in the system to reduce the risk of data leakage. However, the computing resources that can be provided by the mobile edge computing server are still limited, and in the face of the rapidly growing number of terminals and data tasks, there is a need for design algorithms to optimize the allocation of computing resources. In recent years, related work has carried out optimization research on computing resource allocation of mobile edge computing, most of the current work of researching delay optimization in a system does not introduce server dynamic collapse probability, and the research work of introducing server dynamic collapse probability is applicable to fixed and miniature network structures. These works are not applicable to the current complex and diverse network structures and scenarios that dynamically generate complex computing tasks. The stability of the mobile edge computing server is significant for the on-time execution of tasks. In the prior art, when a dynamic collapse probability server is researched, network scale change is not considered, and the server state updating frequency is insufficient, so that the method has a great influence on the research system stability. The slow update of the server state causes deviation to the estimated crash probability; the network scale is dynamically changed under the influence of scene demands, and the universality of the algorithm reduces the application cost. Therefore, an algorithm which is insensitive to network scale change and has high server state updating frequency needs to be designed to solve the problem of computing resource scheduling of the mobile edge computing which is easy to fail, so as to achieve the purpose of reducing the total time delay in the system.

Disclosure of Invention

In order to solve the technical problems, the invention provides an unmanned aerial vehicle-assisted method for optimizing the calculation of the resources of the mobile edge which is easy to fail, and the total time delay in a system is reduced.

In order to achieve the above purpose, the invention provides an unmanned aerial vehicle-assisted method for optimizing the scheduling of the computing resources of the mobile edge which is easy to fail, comprising the following steps:

step one, constructing a neural network and an experience playback pool, and initializing parameters of the neural network and the experience playback pool;

step two, acquiring position parameters of each unmanned aerial vehicle, and calling a server state real-time updating system to acquire first data information;

step three, inputting the position parameters of each unmanned aerial vehicle and the first data information into the neural network to obtain a task execution position;

step four, executing a calculation task of the server state real-time updating system based on the task execution position, and updating the server state real-time updating system;

step five, packaging the position parameters of each unmanned aerial vehicle, the first data information and the task execution position into one piece of information, and storing the information into the experience playback pool;

step six, judging a threshold value of the data quantity stored in the experience playback pool, optimizing the neural network parameters if the data quantity accords with a threshold value range, and returning to the step two if the data quantity does not accord with the threshold value range;

and step seven, judging whether the neural network after optimizing the parameters of the neural network is converged, if so, completing resource scheduling optimization, and if not, returning to the step two.

Optionally, in the first step, a gaussian distribution is used to initialize parameters of the neural network.

Optionally, in the second step, the first data information includes working state information of each unmanned aerial vehicle and a calculation task of a server state real-time update system.

Optionally, the working state information of each unmanned aerial vehicle includes the number of tasks currently executed by the server and the remaining computing resources.

Optionally, in the sixth step, the neural network parameters are optimized by adopting a small batch random gradient descent method.

Optionally, in the second step, the server state real-time updating system is configured to store the number of tasks currently executed by the server and the remaining computing resources by constructing a list with an increase or decrease, update the parallel task information and the remaining computing resources of each server in real time, and calculate the crash probability and feed back to the neural network.

Optionally, in the sixth step, the method for performing threshold judgment on the data amount stored in the experience playback pool includes:

setting a threshold range to be greater than one half of the storage capacity of the experience playback pool based on the data amount stored in the experience playback pool, and optimizing the neural network parameters when the data amount stored in the experience playback pool is greater than one half of the storage capacity of the experience playback pool; and when the data amount stored in the experience playback pool is not more than one half of the storage amount of the experience playback pool, returning to the step two.

Optionally, in the seventh step, the method for judging whether the neural network converges after optimizing the parameters of the neural network includes:

wherein eta is a network parameter shared by the state cost function and the dominance function, and alpha and beta are parameters of the state cost function and the dominance function respectively; the input of the neural network is that the state s comprises calculation task information and server state information, the output is that each unloading decision is represented by a, a' is action at the next moment, Q is an evaluation value of a in the state s, V(s) is a state value function, A (s, a) is a dominance function, and A| is the number of all actions a.

The invention has the technical effects that: the invention discloses an unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method, which is used for updating and calling server state information at higher frequency, ensuring that the calculation of a dynamic collapse probability value of a server is accurate, ensuring that the parameters of an input resource scheduling allocation algorithm are accurate and real, and ensuring that an accurate strategy is output; the resource allocation optimization algorithm based on artificial intelligence is more effective, and the adaptability to the network scale is improved while the time delay in the system is further reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

fig. 1 is a schematic flow chart of an unmanned aerial vehicle-assisted failure-prone mobile edge computing resource scheduling optimization method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a three-layer UAV non-independent co-distributed computing task execution system incorporating crash probability in accordance with an embodiment of the present invention;

FIG. 3 is a graph showing the relationship between SFC link length and time delay according to an embodiment of the present invention;

FIG. 4 is a graph of crash probability coefficients versus time delay according to an embodiment of the present invention;

FIG. 5 is a graph of task data size versus time delay according to an embodiment of the present invention;

FIG. 6 is a graph of UAVs calculation power versus time delay performed in accordance with an embodiment of the present invention;

fig. 7 is a graph of the number of delay-sensitive SFCs versus time delay according to an embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

As shown in fig. 1, the method for optimizing the scheduling of the mobile edge computing resources with the unmanned aerial vehicle assistance, which is easy to fail, comprises the following steps:

In the first step, parameters of the neural network are initialized by adopting Gaussian distribution. Each parameter of the neural network is randomly initialized with a gaussian distribution N (0, σ2). The method for initializing the neural network comprises all-zero initialization, gaussian distribution initialization and the like, and the training tasks are compared as follows: y=ax+b, the parameters of the neural network are the parameters a and b, the input of the neural network is x, the output of the neural network is y, the initialized neural network parameters are Gaussian distribution, which is proved in a large number of training, and the convergence effect of the neural network is improved.

In the second step, the first data information includes working state information of each unmanned aerial vehicle and a calculation task of a server state real-time updating system. The working state information of each unmanned aerial vehicle comprises the number of tasks currently executed by the server and the residual computing resources.

In the sixth step, the neural network parameters are optimized in a small batch random gradient descent mode.

In the second step, the server state real-time updating system is configured to store the task number and the remaining computing resources currently executed by the server by constructing a list with an increase and decrease, update the parallel task information and the remaining computing resources of each server in real time, and calculate the crash probability and feed back to the neural network.

In the sixth step, the method for judging the threshold value of the data amount stored in the experience playback pool comprises the following steps:

setting a threshold range to be more than one half of the memory capacity of the experience playback pool based on the data amount stored in the experience playback pool, and optimizing the neural network parameters when the data amount stored in the experience playback pool is more than one half of the memory capacity of the experience playback pool; and when the data amount stored in the experience playback pool is not more than one half of the storage amount of the experience playback pool, returning to the step two.

In the seventh step, the method for judging whether the neural network converges after optimizing the parameters of the neural network includes:

wherein eta is a network parameter shared by the state cost function and the dominance function, alpha and beta are parameters of the state cost function and the dominance function respectively, the input of the neural network is a state s which comprises calculation task information and server state information, the output is that each unloading decision is represented by a, a' is an action at the next moment, Q is an evaluation value of a in the state s, V(s) is a state value function, A (s, a) is a dominance function, and I A I is the quantity of all actions a.

The second step specifically comprises the following steps: firstly, position parameters of each unmanned aerial vehicle are acquired, a server state real-time updating system is called, the working state of each unmanned aerial vehicle is obtained through the server state real-time updating system, the working state comprises the number of tasks currently executed by a server and residual computing resources, and all non-independent and uniformly distributed computing tasks generated in the current time slot in the system are acquired.

The method comprises the steps of inputting the position parameters of each unmanned aerial vehicle, the working state information of each unmanned aerial vehicle and the calculation task of a server state real-time updating system into a neural network, and outputting a task execution position, wherein the method specifically comprises the following steps: the Q-value function is decomposed into a state value function and a merit function so that the value of each action can be better estimated. In this patent, the input to the neural network is the state s and the output is the Q value for each action. At the last layer of the network, the output is split into two parts: a state value function V(s) and a dominance function a (s, a), where a represents an action. The state value function represents the value of any action taken in state s, while the dominance function represents the value of action a taken in state s relative to other actions. The Q-value function of the present invention can be expressed as Q (s, a) =v(s) +a (s, a) -mean (a (s:)), where mean (a (s:)) represents the mean of the dominance function. When selecting an action, an action a having the largest Q value, that is, argmax (Q (s, a)) is selected.

According to the three-layer UAV non-independent co-distributed computing task execution system with the crash probability introduced, as shown in fig. 2, a three-layer UAVs group structure is adopted, the three-layer UAVs respectively have different functions, the function of the lowest-layer UAVs is task collection, the task is not responsible for executing the computing task, the computing task with the task type requirement is generated through self-collection equipment, and the task is unloaded to the transport-layer UAVs. The task types among the plurality of collected UAVs are not mutually influenced but have correlation, and the task types need to be continuously executed according to a certain sequence to obtain a result. Wherein the position information of the collecting layer UAV i is represented as L _c，i ＝[x _c，i ，y _c，i ，z _c，i ]. Assuming that y is deployed in the execution layer UAVs by y= { f ₁ ，f ₂ ，...，f _|T| And represents a set of virtual network functions. Each execution layer UAV deploys one of the VNFs. The information transmitted to the transport layer UAV is VNF i= [ t ] _i ，f _i ，d _i ，p _i ]Wherein t is _i Representing task generation time, f _i Representing the type of VNF required for a task, d _i The size of the calculated data representing the task,p _i is a time delay sensitive type condition mark. The transport layer UAVs are positioned in the middle layer of the three-layer UAVs system, receive the calculation tasks unloaded by the collection layer UAVs, synthesize the calculation tasks SFC according to rules and send the calculation tasks SFC to the execution layer UAVs. Wherein each transport layer UAV receives only the computational tasks offloaded by the assigned collection layer UAVs. Let Γ be the SFC task request sent by the transport layer unmanned plane, then define it as one (f _j ，t _j ，R _j ，d _j ，p _j ). Wherein f _j VNF class set representing demand, t _j Representing SFC task sending time, R _j Representing request r _j Task reliability requirement probability of (2) in the range of 0 < R _j ≤1，d _j To calculate the data size set, p _j Marked for delay sensitive situations.

The invention comprises a server state real-time updating system, wherein the system crashability and the dynamic crashability are introduced to an execution layer UAV, and the crashability value is positively correlated with the number of tasks executed by equipment at the same time. The crash probability of each executing UAV is dynamically updated according to the load capacity of the executing UAV, and the UAV i is provided with the basic crash probability rho '' _i Since the crash probability increases in proportion to the number of tasks simultaneously executed, it is assumed that the crash probability increases by Δ (ρ) for every increase in the number of tasks _i ) The final crash probability is expressed as: ρ _i ＝ρ′ _i +n*Δ(ρ _i ) Where n represents the number of simultaneous tasks performed by the UAV. Assuming that a collapse of one UAV does not affect other UAVs, the reliability R of one SFC ^γ Is the product of the reliability of the UAVs it uses: r is R ^γ ＝Πr _i Wherein r is _i Representing the reliability of UAVi in UAVs of an execution layer required to be used by SFC, and the reliability r _i Expressed as: r is (r) _i ＝1-ρ _i . SFCs within the system are classified into delay-sensitive and non-delay-sensitive types, marked in the code by specific location values. The time delay sensitive SFC improves the execution speed and stability of the task by being distributed with more computing resources and adding backup SFC, and guarantees the timeliness of the time delay sensitive SFC for completing the task. Taking a backup SFC as an example, when executing tasks, the time delay sensitive SFC firstly selects the optimal execution layer UAVs to execute any one according to an algorithm policyAnd the backup SFC selects suboptimal execution layers UAVs to execute tasks simultaneously according to the algorithm policy. The advantage of parallel execution of the same SFC by a plurality of different device chains is that the breakdown of one link device does not affect the execution of other links, and the problem of high delay caused by the breakdown of terminal devices can be greatly reduced.

The execution layer UAVs of the present invention are located at the uppermost layer of a three-layer UAVs system. Each SFC task execution position transmitted by the UAVs of the transmission layer is planned and scheduled by the artificial intelligence system. In the same SFC, the UAVs for executing the tasks are calculated in the previous part, the task calculation results and the original data of other tasks in the chain are forwarded to the UAVs for executing the subsequent tasks until the tasks are completed, and the channel gain of the line-of-sight link between the UAVs is expressed as:

where g is the channel gain, α ₀ For the channel gain at reference distance d=1m, d _k，l (i) For the Euclidean distance between the receiving UAV k and the transmitting UAV L, i denotes the slot index, q (i) overall denotes the plane direction coordinates of the transmitting UAV L, p _k,l (i) The whole represents the coordinate of the accepted UAV k in the plane direction, H is the difference value between the UAV l and the UAV k in the height, wherein, l and k are the serial numbers of the UAV. Due to obstruction, the wireless transmission rate is expressed as:

wherein B represents a communication bandwidth, p _up Representing the transmission power, σ, of UAVs transmitting computational task data over an uplink ² Which represents the power of the noise and,

the actual output of the dominance function of the forcing optimal action by the forcing DQN is 0, calculated as follows:

where η is a network parameter shared by the state cost function and the dominance function, and α and β are parameters of the state cost function and the dominance function, respectively.

At this time, uniqueness of the value modeling is ensured. In the implementation, the maximization operation is replaced by averaging, namely:

Part of the reason that a lasting DQN will be better than a DQN is that a lasting DQN can learn state-cost functions more efficiently. Each time an update is made, the function is updated, affecting the values of other actions. Whereas a conventional DQN only updates the value of a certain action, the values of other actions are not updated. Thus, the lasting DQN can learn the state-cost function more frequently and accurately.

By carrying out simulation experiments under the FP-MEC-II system formed by three layers of unmanned aerial vehicles, the performance of the proposed algorithm is evaluated according to experimental results, and main parameters are shown in table 1.

TABLE 1

The triangle in the center represents the positions of the base station and the MEC server, the round dot represents the positions of IIN equipment obeying random distribution, and the distance from the IIN equipment to the base station is not more than 200 meters. The algorithm provided by the invention is compared with other four methods under the following simulation scenes: wherein "all local" means that all IIN devices execute computing tasks through their own resources; "fully offloaded" means that all IIN devices offload their computing tasks to the MEC server for execution, and the MEC server distributes computing resource F equally to each task; "random offload" means that all IIN device computing tasks are randomly selected to be executed locally or offloaded to the MEC server for execution, while the computing resources F of the MEC server are evenly distributed to each computing task offloaded to the server; "Q learning" means determining the allocation of the unloading position of the task using the Q-learning algorithm.

As shown in fig. 3, each point plotted is a mean value obtained by simulation for a large number of times in the case where task data is randomly generated within a certain range. In this case of randomness, the merits of the algorithm can be reflected. From the figure, the DRDOA algorithm provided by the invention obtains the result which is closer to the traversal search method in the process of the SFC link length parameter change. When the SFC link length is shorter, the calculation tasks generated in the corresponding system are fewer, and at the moment, the residual calculation resources of UAVs of each execution layer are more, so that the difference between the four algorithms is minimum. With the continuous increase of the SFC link length, the number of calculation tasks generated in the system is increased, and the pressure of insufficient resources is brought to the server. The DRDOA algorithm analyzes the generated calculation task condition and server resource use condition, and then the advantages of flexible scheduling and distribution of the task execution positions are reflected. Although Q-learning takes the computing task and the server state as inputs to the algorithm, due to the internal design flaws of the algorithm, each training iteration can only update the Q value of one state action pair, while the values of other actions are not updated. Therefore, compared with Q-learning, DRDOA can learn the state cost function more frequently and accurately, and select more optimal actions.

As shown in fig. 4, it can be seen from the figure that in the latest search method, since only the latest execution layer UAV is selected for task offloading execution, the system delay is greatly affected by the crash probability coefficient. And the Q-learning, DRDOA and the traversal search algorithm schedule the execution positions of the calculation tasks according to the working states of all the UAVs of the execution layers in the system, so that the influence of the crash probability coefficient on the total time delay of the system is reduced. And the growth curve of the DRDOA algorithm is closer to the traversal search algorithm, so that the DRDOA algorithm has good performance.

As shown in fig. 5, in the setting of the simulation parameters, the number of transport layer UAVs is changed to 6, which also increases the number of SFCs to 6, and only one SFC is a delay-sensitive SFC. As can be seen from simulation results, the utilization efficiency of the whole resources in the system is still not high under the conditions of multitasking and large data volume by the latest search method. The DRDOA provided by the invention can still be closer to the result of the traversal algorithm and has lower time delay compared with Q-learning and the latest searching methods under the condition of facing tasks with multiple tasks and large data volume. Therefore, in a complex working scene facing multiple SFCs, the DRDOA algorithm provided by the invention can provide resource scheduling service with lower time delay for users.

As shown in fig. 6, in the setting of the simulation parameters, the number of transport layer UAVs is 6, which is equivalent to the task that has 6 SFCs, where only one SFC is a delay-sensitive SFC, to be executed. As can be seen from simulation results, the system time delay under the four algorithms is reduced along with the improvement of the computing capacity of the equipment. And the magnitude of the decrease slows down because excessive computing resources may be wasteful of resources in the face of fixed computing tasks. Compared with Q-learning and recent search methods, the DRDOA provided by the invention can still be more similar to a traversal algorithm in terms of amplitude reduction, has lower time delay, and verifies the excellent performance of the algorithm.

As shown in fig. 7, in the setting of the simulation parameters, the number of transport layer UAVs is 6, which is equivalent to 6 SFC tasks to be executed. In the simulation, a task rejection mechanism is introduced, and if the selected execution equipment comprehensive reliability can meet the requirement of the task on the reliability. The task is received and otherwise the task is rejected. The time delay calculation formula in the simulation graph is as follows: latency of executing tasks + (number of reject tasks x 200). The time delay caused by rejecting a task is increased compared with the time delay of normally executing the task, and the Q-learning algorithm and the DRDRO algorithm have the advantages that the computing power in a dispatching system is comprehensively calculated as much as possible, the reliability requirement of computing the task is met, and the number of rejecting the task is reduced. As can be seen from simulation results, the system time delay under the four algorithms is increased along with the increase of the specific gravity of the time delay sensitive task. The recent search method has the defects that the number of refused to receive tasks increases along with the intensity of the computing tasks due to the insufficient utilization of computing resources in the system, and the comprehensive time delay increases. And the DRDOA provided by the invention has higher efficiency in utilizing system resources than Q-learning. Among the three algorithms, the DRDOA algorithm is closer to the traversal algorithm, has lower time delay, and verifies the excellent performance of the algorithm.

The invention discloses an unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method, which is used for updating and calling server state information at higher frequency, ensuring that the calculation of a dynamic collapse probability value of a server is accurate, ensuring that the parameters of an input resource scheduling allocation algorithm are accurate and real, and ensuring that an accurate strategy is output; the resource allocation optimization algorithm based on artificial intelligence is more effective, and the adaptability to the network scale is improved while the time delay in the system is further reduced.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method is characterized by comprising the following steps of:

2. The unmanned aerial vehicle-assisted, failure-prone mobile edge computing resource scheduling optimization method of claim 1, wherein in step one, parameters of the neural network are initialized with gaussian distributions.

3. The unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method according to claim 1, wherein in the second step, the first data information comprises working state information and server state real-time updating system computing tasks of each unmanned aerial vehicle.

4. The unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method of claim 3, wherein the operating state information of each unmanned aerial vehicle comprises the number of tasks currently performed by the server and remaining computing resources.

5. The unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method of claim 1, wherein in the sixth step, the neural network parameters are optimized in a small-batch random gradient descent mode.

6. The unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method according to claim 1, wherein in the second step, the server state real-time updating system is configured to store the number of tasks and remaining computing resources currently executed by the server by constructing a list with an increase or decrease, update parallel task information and remaining computing resources of each server in real time, and calculate a crash probability to feed back to the neural network.

7. The unmanned aerial vehicle-assisted failure-prone mobile edge computing resource scheduling optimization method according to claim 1, wherein in the sixth step, the method for performing threshold judgment on the amount of data stored in the experience playback pool comprises:

8. The unmanned aerial vehicle-assisted failure-prone mobile edge computing resource scheduling optimization method according to claim 5, wherein in the seventh step, the method for determining whether the neural network after optimizing the neural network parameters converges comprises: