WO2022242468A1 - Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium - Google Patents

Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022242468A1
WO2022242468A1 PCT/CN2022/091260 CN2022091260W WO2022242468A1 WO 2022242468 A1 WO2022242468 A1 WO 2022242468A1 CN 2022091260 W CN2022091260 W CN 2022091260W WO 2022242468 A1 WO2022242468 A1 WO 2022242468A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
model
scheduling
offloading
optimization
Prior art date
Application number
PCT/CN2022/091260
Other languages
French (fr)
Chinese (zh)
Inventor
任涛
胡哲源
谷宁波
牛建伟
胡舒程
李青锋
Original Assignee
北京航空航天大学杭州创新研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110537588.4A external-priority patent/CN112988285B/en
Priority claimed from CN202110765005.3A external-priority patent/CN113254188B/en
Application filed by 北京航空航天大学杭州创新研究院 filed Critical 北京航空航天大学杭州创新研究院
Publication of WO2022242468A1 publication Critical patent/WO2022242468A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating

Definitions

  • the present application relates to the technical field of task offloading and scheduling optimization, and in particular, to a task offloading method, a scheduling optimization method and device, electronic equipment, and a storage medium.
  • the computing offload problem that is, the wireless user equipment chooses to offload computing tasks to a nearby server or perform locally, and how to allocate resources for the tasks offloaded to the server (such as computing resources and energy resources).
  • UAVs can be used as communication relay stations or edge computing platforms.
  • it is necessary to properly determine the scheduling of UAV computing tasks in the mobile edge computing network (whether the computing task is executed locally on the mobile device or dispatched to the UAV or base station). decision to obtain desired performance.
  • one aspect of the present application provides a task offloading method and device, electronic equipment, and a storage medium, so as to improve the problems existing in related technologies.
  • One aspect of the present application provides a task offloading method, the task offloading method is applied to an electronic device, the electronic device is communicatively connected to a task offloading system, and the task offloading system includes a second device and at least one first device,
  • the task offloading method may include:
  • the task offloading method may also include the step of obtaining a task offloading model, which may include:
  • the system model is trained according to the optimized cost function to obtain a task offloading model.
  • the step of establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system may include:
  • An optimization cost function is established based on the system model.
  • the task offloading model includes a first task offloading model and a second task offloading model
  • the step of training the system model according to the optimization cost function to obtain the task offloading model may be include:
  • the system model is trained according to the second optimization cost function to obtain a second task offloading model.
  • the task offloading strategy includes a first task offloading strategy and a second task offloading strategy
  • the step of inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy may be include:
  • the task to be processed is input into the second task offloading model to obtain a second task offloading policy.
  • the step of training the system model according to the first optimization cost function to obtain a first task offloading model may include:
  • the deep reinforcement learning model is trained according to the first optimized cost function to obtain a first task offloading model.
  • the step of training the system model according to the second optimization cost function to obtain a second task offloading model may include:
  • the alternating direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
  • the present application also provides a task offloading device, the task offloading device is applied to electronic equipment, and the electronic equipment is connected in communication with a task offloading system, the task offloading system includes a second device and at least one first device, the task Unloading devices can include:
  • a task acquisition module configured to acquire pending tasks of the at least one first device, wherein the pending tasks include target tasks
  • a strategy acquisition module configured to input the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;
  • a policy sending module configured to send the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the first device based on the task offloading policy.
  • a second device where the second device executes the target task.
  • the present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • a processor executes the program, the tasks described in any one of the preceding embodiments are realized. Uninstall method.
  • the present application provides a storage medium, the storage medium includes a computer program, and when the computer program runs, the computer program controls the electronic device where the storage medium is located to execute the task offloading method described in any one of the foregoing implementation manners.
  • the task offloading strategy is obtained by inputting the task to be processed into the task offloading model, and the task offloading strategy is sent to the first device, so that the first device can offload the task based on the task.
  • the strategy offloads the target task to the second device for processing, realizes the offloading of the target task to the server for processing, and avoids the problems in related technologies that either all tasks are performed locally on the wireless user equipment, or all tasks are offloaded and performed remotely on the server. The problem of low efficiency of task offloading.
  • Another aspect of the present application also provides a method and device for scheduling optimization, electronic equipment, and storage media, so as to improve the problems existing in related technologies.
  • the scheduling optimization method is applied to electronic equipment, and the electronic equipment is connected to a mobile edge computing network system in communication.
  • the mobile edge computing network system includes at least one base station, wireless For man-machine and mobile devices, the scheduling optimization method may include:
  • pending tasks and current location information of the at least one mobile device wherein the pending tasks include a first task and a second task;
  • the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one drone for processing based on the scheduling strategy, and the second The task is forwarded by the at least one drone to the at least one base station for processing.
  • the scheduling optimization method may be implemented by using the task offloading method according to the implementation manners of the present application.
  • the scheduling optimization method further includes the step of obtaining a scheduling optimization model, which may include:
  • the initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
  • the step of establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system may include:
  • An optimization objective function is established according to the initial model.
  • the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and the initial model is trained according to the optimization objective function to obtain scheduling optimization
  • the steps of the model can include:
  • the initial model is trained according to the first optimization objective function to obtain the UAV trajectory planning model, and the initial model is trained according to the second optimization objective function to obtain the computing task joint scheduling model , training the initial model according to the third optimization objective function to obtain the resource allocation model.
  • the step of inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy may include:
  • the step of inputting the current location information into the UAV trajectory planning model and calculating the predicted location information of the at least one mobile device may include:
  • the step of inputting the to-be-processed tasks and predicted location information into the task joint scheduling model, and calculating the task scheduling decision variables of the at least one mobile device may include:
  • the decision-making actions are integrated to obtain task scheduling decision variables.
  • the present application provides a scheduling optimization device, which is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system.
  • the mobile edge computing network system includes at least one base station, unmanned aerial vehicles, and mobile equipment.
  • the scheduling optimization Devices include:
  • the task acquisition module may be configured to: acquire the pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;
  • the strategy acquisition module may be configured to: input the task to be processed and the current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;
  • a policy sending module configured to: send the scheduling policy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one mobile device based on the scheduling policy
  • the UAV performs processing, and forwards the second task to the at least one base station through the at least one UAV for processing.
  • the scheduling optimization device is implemented as the task offloading device according to the implementation manners of the present application.
  • the present application provides an electronic device, which may include: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, it implements any of the foregoing embodiments A task offloading method and/or a scheduling optimization method.
  • the present application provides a storage medium, the storage medium may include a computer program, and when the computer program is running, the electronic device where the storage medium is located is controlled to execute the task offloading method and/or scheduling optimization described in any one of the preceding embodiments method.
  • the scheduling strategy is obtained by inputting the tasks to be processed and the current location information into the preset scheduling optimization model, and the scheduling strategy is sent to at least one mobile device, so that At least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station through at least one UAV for processing, realizing the scheduling of the first task to the UAV For processing, the second task is dispatched to the base station for processing, which avoids the problem of low efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. .
  • Fig. 1 shows a structural block diagram of a data processing system provided by some embodiments of the present application.
  • FIG. 2 shows a structural block diagram of a task offloading system provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a task offloading method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a task offloading model provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a deep reinforcement learning model provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of the ECRA algorithm provided by the embodiment of the present application.
  • FIG. 7 is another schematic flow chart of the task offloading method provided by the embodiment of the present application.
  • Fig. 8 shows a structural block diagram of a data processing system provided by other embodiments of the present application.
  • FIG. 9 shows a structural block diagram of a scheduling optimization system provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a scheduling optimization method provided by an embodiment of the present application.
  • FIG. 11 is another schematic flowchart of the scheduling optimization method provided by the embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a scheduling optimization model provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of the LSTM network provided by the embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an LSTM network-based mobile device location prediction model provided by an embodiment of the present application.
  • FIG. 15 is a schematic flowchart of an FCM-based mobile device clustering algorithm provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of the actor neural network and the evaluator neural network provided by the embodiment of the present application.
  • FIG. 17 is a schematic flowchart of a DDPG-based computing task scheduling algorithm provided by an embodiment of the present application.
  • FIG. 18 is a schematic flowchart of a scheduling variable shaping integration algorithm provided by an embodiment of the present application.
  • FIG. 19 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 20 is a structural block diagram of a task offloading device provided by an embodiment of the present application.
  • FIG. 21 is a structural block diagram of a scheduling optimization device provided by an embodiment of the present application.
  • Icons 10-data processing system; 100-electronic equipment; 110-first memory; 120-first processor; 130-communication module; 200-task offloading system; 300-scheduling optimization system; 210-first device; 220 -second device; 400-task unloading device; 410-task acquisition module; 420-unloading strategy acquisition module; 430-unloading strategy sending module; 500-task scheduling device; 510-task acquisition module; 520-scheduling strategy acquisition module; 530 —Scheduling policy sending module.
  • Mobile edge computing (Mobile edge computing, MEC) is a promising technology, which can provide powerful computing power and energy resources for users' mobile devices by setting edge servers in the edge computing network. Type tasks are offloaded to the edge server to reduce task execution delay and save battery energy consumed by local devices.
  • WPT wireless power transfer
  • the computing offloading problem that is, the wireless user equipment chooses to offload computing tasks to a nearby MEC server or execute locally, and how to allocate resources for the tasks offloaded to the server (such as computing resources and energy resources).
  • a wireless network consists of multiple wireless user equipments, and the dynamic change of time-varying channel conditions caused by the mobility of wireless user equipments complicates the offload scheduling process.
  • a good computing offload strategy can improve the overall computing power of wireless user equipment and enhance the performance of mobile edge computing systems. Therefore, a lot of recent research and inventions have focused on designing efficient computation offloading and resource allocation strategies.
  • Some existing inventions or researches propose the use of dynamic programming algorithms and branch-and-bound methods to offload computing tasks and allocate resources in mobile edge computing networks.
  • these methods require a lot of computational complexity when solving optimization variables. Time is only applicable to scenarios with relatively simple network environments.
  • offloading optimization methods based on heuristic algorithms can reduce computational complexity, such methods usually require a large number of computational iterations to achieve satisfactory optimization results, which may not be practically applied to dynamic mobile edge computing networks (i.e., wireless user equipment Time-varying channel conditions caused by mobile movement) for online computation offloading.
  • embodiments of the present application provide a task offloading method and device, electronic equipment, and a storage medium.
  • the technical solution of the present application will be described below through possible implementation modes.
  • Fig. 1 is a structural block diagram of a data processing system 10 provided by some embodiments of the present application, which provides a possible implementation of a data processing system 10, referring to Fig. 1, the data processing system 10 may include an electronic device 100, task offloading One or more of the system 200.
  • the electronic device 100 communicates with the task offloading system 200, and the electronic device 100 obtains the tasks to be processed by the task offloading system 200, and obtains a task offloading strategy according to the pending tasks, so that the task offloading system 200 performs task offloading processing according to the task offloading strategy.
  • the specific composition of the task offloading system 200 is not limited, and can be set according to actual application requirements.
  • the task offloading system 200 may include a second device 220 and at least one first device 210 .
  • the electronic device 100 and the first device 210 may be the same device; in another alternative example, the electronic device 100 and the second device 220 may be the same device.
  • first device 210 and the second device 220 are not limited, and may be set according to actual application requirements.
  • first device 210 may be a wireless user device
  • second device 220 may be an edge computing server.
  • Each wireless user equipment is equipped with a wireless transmission antenna, which can perform data transmission with the wireless access point, and can also receive energy from the wireless access point. The energy received from the wireless access point is stored in the rechargeable battery of the wireless user device.
  • FIG. 3 shows one of the flow charts of the task offloading method provided by the embodiment of the present application.
  • the method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 . It should be understood that in other embodiments, the order of some steps in the task offloading method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted.
  • the flow of the task offloading method shown in FIG. 3 will be described in detail below.
  • Step S310 acquiring at least one pending task of the first device 210.
  • the tasks to be processed include target tasks.
  • Step S320 inputting the tasks to be processed into a preset task offloading model to obtain a task offloading strategy.
  • the task offloading model is trained based on the established system model.
  • Step S330 sending the task offloading policy to at least one first device 210, so that at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 executes the target task.
  • the above method obtains the task offloading strategy by inputting the tasks to be processed into the task offloading model, and sends the task offloading strategy to the first device, so that the first device offloads the target task to the second device for processing based on the task offloading strategy.
  • the tasks are offloaded to the server for processing, which avoids the problem of low efficiency of task offloading caused by the related technologies that all tasks are either executed locally on the wireless user equipment, or all offloaded and executed remotely on the server.
  • the task offloading method provided by the present application may also include the step of obtaining a task offloading model, which may include:
  • the specific ways of establishing the system model and optimizing the cost function according to the cost parameters of the task offloading system 200 are not limited, and can be set according to actual application requirements.
  • the following sub-steps may be included:
  • each time slice is T seconds long, and assuming that when each wireless user equipment generates a calculation-intensive task in the time slice t, the execution time of these tasks will not exceed a length of time.
  • the computing power of the MEC server where the wireless access point is deployed is much stronger than that of the wireless user equipment. Therefore, each wireless user equipment can choose to perform tasks remotely on the server by offloading calculations, or choose to perform tasks locally.
  • this application uses Indicates the channel gain of the i-th wireless user equipment in the time slice t at the wireless access point, and the length of the time slice is small enough to ensure the channel gain in the time slice The size of is unchanged.
  • the wireless channel gain can be expressed as where ⁇ t denotes an independent exponential random variable with unit mean, Expressed by the following formula:
  • a g represents the antenna gain
  • f c represents the carrier frequency
  • l e represents the road strength fading index
  • the edge computing server charges each user equipment for q t T seconds through wireless power transmission technology, where q t ⁇ [0,1], i
  • the energy obtained by a wireless user equipment is:
  • ⁇ ⁇ (0, 1) represents the efficiency of wireless energy harvesting
  • P i represents the transmission power between the wireless access connection point and the user equipment
  • q t represents the time ratio of wireless charging.
  • E t is the energy consumed in time slice t
  • H t is the energy obtained through wireless power transmission technology in time slice t
  • M t+1 should be a non-negative value. If the current time slice does not have sufficient energy (M t+1 ⁇ 0), the wireless user equipment will discard the current task and set M t+1 to 0 , re-execute the task in the next time slice.
  • this application will generate the task generated by the i-th wireless user equipment in the time slice t It can be expressed as in Indicates the task The amount of data (unit: bit), Indicates the number of time cycles required for the CPU to process 1-bit data. In this way, when executing the task The required execution cycle is Define W as the bandwidth of the wireless channel, and the interference between channels can be ignored. If k wireless user equipments unload the current task at the same time in time slice t, the wireless bandwidth W will be evenly allocated to each user equipment that decides to offload.
  • each wireless user equipment After obtaining the energy transmitted from the wireless access point, each wireless user equipment needs to decide whether to offload the computing task to the edge server or execute it locally, so as to optimize the scheduling to reduce the delay and energy consumption of the overall task.
  • This application adopts a complete offloading method, that is, tasks arriving in the current time slice are either executed locally on the wireless user equipment, or remotely executed on the MEC server through computing offload.
  • Use Indicates the unloading decision variable of the i-th wireless user equipment in time slice t, where, Indicates that the wireless user equipment chooses to offload to the edge computing server (edge computing), Indicates that the computing task is performed locally on the wireless user equipment.
  • the wireless user equipment in the mobile edge computing network of this application can obtain power wirelessly and perform local computing at the same time, Indicates the computing capability of the i-th wireless user equipment (unit: CPU cycle/second), the computing capability of different devices is different, and the processing task
  • the computing offloading process can be divided into three parts: first, the wireless user equipment offloads the task data to the edge computing server through wireless transmission; then, the edge computing server allocates computing resources to the offloading The calculation of the task is completed; finally, the calculation result of the task is sent back to the corresponding wireless user equipment through wireless transmission. Since the amount of task calculation results is much smaller than the amount of task data, this application ignores the transmission delay and energy consumption caused by the download of calculation results. Therefore, the calculation offload delay from the i-th wireless user equipment to the edge computing server can be expressed as:
  • Edge computing server running tasks The time is:
  • the total amount of computing resources allocated to all offloading tasks from the edge server should be less than the computing resource F of the entire server.
  • the i-th wireless user device waits locally on the edge server to perform tasks remotely
  • the energy consumed during the period can be expressed by the following formula:
  • this application proposes an optimization cost function that minimizes the total system cost through the joint optimization of task offloading and resource allocation.
  • the specific optimization objective problem is described as follows:
  • the optimization cost function of the entire system in the above formula is divided into two parts: the local computing cost and the cost of offloading the computing to the edge server. and Expressed specifically as:
  • ⁇ 1 and ⁇ 3 are the weights of task processing delay
  • ⁇ 2 and ⁇ 4 are the weights of energy consumption, and satisfy ⁇ 0 ⁇ i ⁇ 1
  • P Denotes the offload decision variable for all wireless UEs, Refers to the percentage of energy consumed by wireless user equipment to offload data to the total energy, is a resource allocation vector, and each component represents the computing resource allocated by the edge server to each upload task.
  • This application stipulates that if the wireless user equipment i chooses to perform tasks locally Then the edge server will not allocate computing resources for it, that is, when hour, Constraint (a) indicates that the wireless user equipment either chooses to offload the task to the server or execute it locally.
  • Constraint (b) indicates that the computing resource allocated by the edge server to any wireless user equipment performing the offloading task cannot exceed the maximum resource value.
  • Constraint (c) ensures that the sum of allocated computing resources does not exceed the maximum resource value of the edge server.
  • (f) It is stipulated that in the time slice t, the current power of each wireless user equipment can neither be greater than the maximum energy that the equipment can provide, nor can it be a negative value, otherwise a penalty item needs to be added.
  • the system model is trained according to the optimization cost function to obtain the task offloading model.
  • the specific method is not limited, and can be set according to actual application requirements.
  • the task offloading model includes a first task offloading model and a second task offloading model
  • the system model is trained according to an optimized cost function
  • the step of obtaining the task offloading model may include the following substeps:
  • Segment the optimization cost function to obtain the first optimization cost function and the second optimization cost function train the system model according to the first optimization cost function to obtain the first task offloading model; Training to obtain the second task offloading model.
  • the original optimization problem can be decomposed into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation.
  • 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation.
  • Figure 4 the deep reinforcement learning method and System optimization framework for alternating direction multiplier methods.
  • the solution P of the optimization function belongs to the mixed-integer non-linear programming (Mixed-Integer NonLinear Programming, MINLP) problem, that is, it is a non-convex problem.
  • MINLP Mixed-integer non-linear programming
  • the computational complexity of this problem increases sharply, and it is difficult to solve it directly. Therefore, considering the dependence of the four variables to be sought (x t , f t , q t , h t ) (for example, if a certain component of x t is 0, then the values of the components corresponding to f t and h t are also 0.
  • This application decomposes the problem into the following two sub-problems, and there is no dependence between the variables to be determined in each sub-problem: 1) Task calculation offloading and energy transmission (P1) of wireless user equipment, that is, how to determine x t , q t 2) edge computing server computing resources and energy allocation (P2). Once the values of x t and q t are determined, it becomes easy to solve f t and h t .
  • the specific manner of training the system model according to the first optimization cost function to obtain the first task offloading model is not limited, and can be set according to actual application requirements.
  • the following sub-steps may be included:
  • a deep reinforcement learning model is established based on the system model; the deep reinforcement learning model is trained according to the first optimization cost function to obtain a first task offloading model.
  • subproblem P1 the computational offloading decision optimization problem for tasks generated by each wireless user equipment is still a non-convex problem.
  • Traditional numerical optimization methods often require a large number of iterative calculations to obtain satisfactory results, which makes them unsuitable for real-time MEC in dynamic environments where channel gain changes. Therefore, this application adopts reinforcement learning to realize real-time scheduling of computing offloading.
  • subproblem P1 the system state transition probabilities of mobile edge computing networks are usually unobtainable due to the high-dimensional state space and action space, and this application is based on deep reinforcement learning
  • the method of allows each wireless user equipment to choose whether to offload the task of time slice t arrival to the edge server according to the current system state.
  • the specific P1 problem can be expressed as:
  • the method based on reinforcement learning needs to define the state, action and reward function of solving the problem, as follows:
  • the indicator function 1 ⁇ cond ⁇ is used to indicate the penalty for introducing task failure when the cond condition is met, so the penalty cost cost function is expressed as:
  • ⁇ 1 and ⁇ 2 are the weight of penalty, and
  • this application improves the exploration strategy of complex high-dimensional action spaces based on the twin delayed deep deterministic policy gradient algorithm (twin delayed deep deterministic policy gradient, TD3), and proposes computational offloading and energy transfer based on reinforcement learning Method (RL-Based approach for Computation Offloading and Energy Transmission, RLCOET), thus avoiding the problem of slow convergence or falling into a local optimal solution due to the difficulty of fully exploring the action space.
  • twin delayed deep deterministic policy gradient twin delayed deep deterministic policy gradient, TD3
  • RL-Based approach for Computation Offloading and Energy Transmission RLCOET
  • the TD3 algorithm includes two critic networks and one action network, and the two critic networks respectively estimate two Q values (value prediction values), namely and The action network takes the current state as input and outputs the corresponding action.
  • the action a t generated by this strategy is combined with the ECRA optimization method to calculate the remaining optimization variables of the current time slice and further obtain the current reward R t and the state st+1 of the next stage, store (st t , a t , R t , st+1 ) as an experience obtained from an interaction with the environment in the experience pool, and select a
  • the experience with a large batch loss value is used to train the neural network through the priority experience replay technology.
  • the action network has two branches: one is used to predict the energy transfer ratio q t , which is a one-dimensional continuous variable between 0 and 1, so this item introduces Gaussian noise during action exploration and evaluates the result Clipping, so that it also remains between 0 and 1; the other part x t is an N-dimensional discrete vector, and the search space for the solution is 2 N .
  • the output of the action network is a continuous slack decision variable Generate K discrete decision-making actions using the order-preserving quantization method
  • the order-preserving quantization method has the advantage of balancing the computational complexity and model performance of the model, and can realize an extensive search of the x t action space when K is small.
  • the experience (st t , at t , R t , st t+1 ) obtained by the RLCOET algorithm each time it interacts with the system environment is stored in the experience pool, where at and R t are the best action sums in action generation and selection award.
  • R t are the best action sums in action generation and selection award.
  • this application adopts the priority experience playback technology, sets up the experience pool with the SumTree structure, and sorts the samples according to the priority. If the loss value of the sample is higher, the priority is higher. It is more likely to be selected to update the network parameters, which can train the network more effectively and accelerate the convergence of the model.
  • this application also performs numerical smoothing on the neighborhood around the target action space to reduce errors, that is, adding a certain amount of noise ⁇ in the target action network.
  • the noise ⁇ can be regarded as a kind of regularization, which makes the update of the value function more stable, and makes the predicted value of the target Q value Q target more accurate and robust.
  • the network loss function also contains two parts.
  • the gradient of the loss function is derived to update the parameters of the action network as follows:
  • N m is the number of samples selected from the prior experience replay experience pool
  • x t the average cross-entropy loss is used to update the parameter ⁇ of the action network:
  • x t is the unloading vector part of a t .
  • the total loss function for updating the action network is:
  • the specific manner of training the system model according to the second optimization cost function to obtain the second task offloading model is not limited, and can be set according to actual application requirements.
  • the following sub-steps may be included:
  • An alternating direction multiplier method model is established based on the system model; the alternate direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
  • the computing resource size and energy allocation ratio of each task uploaded to the edge server can be obtained by using the alternating direction multiplier method.
  • the optimization variables x t and q t of the problem P1 can be obtained.
  • ADMM-based method is adopted to solve problem P2.
  • the ADMM method is a computational framework for solving optimization problems, which is suitable for solving large-scale distributed convex optimization problems. ADMM decomposes a large global problem into multiple smaller and easy-to-solve sub-problems through "decomposition-coordination" processing, and coordinates the solutions of each sub-problem to obtain the solution of the overall global problem.
  • P2 is transformed into a constrained optimization problem involving two types of variables. This structure can easily handle the regularization term in the optimization objective.
  • P2 is solved using the ADMM algorithm and the augmented Lagrangian method, as follows:
  • the penalty item coefficient ⁇ ( ⁇ >0) is a fixed value.
  • the above three steps are performed iteratively until the following two conditions are met: absolute error and relative error are less than a given threshold.
  • the problem P2 can be solved by the ECRA algorithm shown in Figure 6 and the convergence of the algorithm can be guaranteed, and its convergence is related to ⁇ .
  • the complexity of the total algorithm is O( N). It is worth noting that since the original problem is non-convex, although there is no guarantee that the algorithm can find the optimal solution to the original problem, the error between the approximate solution and the optimal solution obtained is within a controllable range.
  • Figure 7 corresponds to the steps of training the deep reinforcement learning model and the alternating direction multiplier method model.
  • the specific manner of obtaining the task offloading policy is not limited, and can be set according to actual application requirements.
  • the task offloading strategy includes a first task offloading strategy and a second task offloading strategy
  • the task to be processed is input into a preset task offloading model
  • the step of obtaining the task offloading strategy may include the following sub-steps :
  • the first task offloading strategy may include the computing offloading decision variable of each wireless user equipment and the proportion of time spent on wireless charging of the device
  • the second task offloading strategy may include the computing resource size and energy allocated for each task uploaded to the edge server distribution ratio.
  • the embodiment of the present application provides an efficient online offloading method in a large-scale mobile edge computing network, including the following sub-steps:
  • Step 1 Construct a system model for a large-scale mobile computing network and provide an optimization objective function based on the wireless charging device offloading task execution delay and energy consumption.
  • Step 2 Decompose the original optimization problem into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation, and respectively design the sub-problems based on deep reinforcement learning method and alternating direction multiplier method System optimization framework.
  • Step 3 Aiming at sub-problem 1 in step 2, a method based on deep reinforcement learning is proposed to obtain the ratio of computing offloading decision variables of each wireless user device to the time spent on wireless charging of the device.
  • Step 4 For sub-problem 2 in step 2, use the alternating direction multiplier method to obtain the size of computing resources allocated to each task uploaded to the edge server and the energy allocation ratio.
  • Step 5 According to the calculation results of Step 3 and Step 4, an effective optimization algorithm is proposed to train the model until the requirements are met.
  • This application uses a brand-new computing offloading method for mobile edge computing networks.
  • the proposed RLCOET algorithm can obtain an efficient offloading strategy by learning and interacting with wireless user equipment movement in a dynamic edge computing network environment.
  • the method of the present application alleviates the requirement of solving scheduling optimization through repeated iterative calculations, and enables all tasks to obtain satisfactory calculation delay and lower energy consumption.
  • all scheduling variables are optimized together, which may face convergence troubles when there are many variables to be solved.
  • This algorithm decomposes the entire optimization problem into two sub-problems (computation offloading and energy transfer , computing resources and energy allocation) and solve them separately, which effectively reduces the complexity of the algorithm.
  • the proposed algorithm is easy to converge, and a near-optimal computation offloading strategy is obtained in MEC networks with large-scale scheduling variables.
  • the task offloading method is based on a mobile edge computing network.
  • a mobile edge computing network when the network infrastructure is unavailable (such as a natural disaster rescue site), network equipment is sparsely distributed (such as a field operation environment), or when facing a temporary surge of mobile devices far beyond the network service capacity (such as a large game or rally ), in view of the high maneuverability and flexibility of UAVs (Unmanned Aerial Vehicles, UAVs), UAVs can be used as communication relay stations or edge computing platforms.
  • UAVs Unmanned Aerial Vehicles, UAVs
  • researchers have established a communication relationship with users' mobile devices (Mobile Devices, MDs) by deploying relevant wireless communication nodes on UAVs, and proposed the use of UAVs to assist mobile edge computing (Mobile Edge Computing) in various application scenarios.
  • the drone-assisted mobile edge computing network will bring many advantages, such as reducing network overhead, reducing computing task execution latency, better quality of experience (QoE), and extending battery life of mobile devices Wait.
  • UAV-assisted edge computing systems often only use one or more UAVs as edge computing devices to ensure low latency and reliability of network system computing task transmission. Due to the limitations of the current development of UAV technology and the weak computing power of computing devices deployed in UAVs, it is not enough to use UAV-assisted edge computing networks to provide satisfactory services for multiple mobile devices.
  • a more promising model is to realize the construction of mobile edge computing network among mobile devices, drones and cellular network base stations (cellular base stations, BS).
  • cellular base stations cellular base stations, BS.
  • some existing edge computing networks composed of mobile devices, UAVs and base stations only contain one UAV.
  • the computing task requirements of multiple mobile devices cannot be satisfied at the same time, and the task computing delay of the network system is increased.
  • FIG. 8 is a structural block diagram of a data processing system 10 provided by other embodiments of the present application, which provides a possible implementation of the data processing system 10.
  • the data processing system 10 may include an electronic device 100, a scheduling One or more of system 300 are optimized.
  • the electronic device 100 communicates with the scheduling optimization system 300, and the electronic device 100 obtains the tasks and locations to be processed by the scheduling optimization system 300, and obtains a scheduling strategy according to the tasks and locations to be processed, so that the scheduling optimization system 300 can perform scheduling optimization according to the scheduling strategy deal with.
  • the specific composition of the scheduling optimization system 300 is not limited, and can be set according to actual application requirements.
  • the scheduling optimization system 300 may include at least one base station, a drone, and a mobile device.
  • the electronic device 100 and the mobile device may be the same device; in another alternative example, the electronic device 100 and the drone may be the same device; in another In an alternative example, the electronic device 100 and the base station may be the same device.
  • the number of base stations is not limited, and can be set according to actual application requirements.
  • the number of base stations may be one.
  • this application establishes a network consisting of A mobile edge computing network composed of a single base station, multiple drones, and a large number of mobile devices. Computational tasks generated by mobile devices in the network can either be performed on the mobile device itself, offloaded to one of the drones for simple calculations, or further transmitted to the base station for more intensive calculations.
  • FIG. 10 shows one of the flowcharts of the scheduling optimization method provided by the embodiment of the present application.
  • the method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 .
  • the scheduling optimization device according to the embodiments of the present application may be implemented by the task offloading device according to some embodiments of the present application.
  • the order of some steps in the scheduling optimization method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted.
  • the flow of the scheduling optimization method shown in FIG. 10 will be described in detail below.
  • Step S410 acquiring the pending tasks and current location information of at least one mobile device.
  • the tasks to be processed include the first task and the second task.
  • step S420 the task to be processed and the current location information are input into a preset scheduling optimization model to obtain a scheduling strategy.
  • the scheduling optimization model is obtained by training based on the established initial model.
  • Step S430 sending the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one UAV through at least one UAV. base station for processing.
  • the above method obtains a scheduling strategy by inputting the pending tasks and current location information into a preset scheduling optimization model, and sends the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one mobile device based on the scheduling strategy.
  • the second task is forwarded to at least one base station for processing through at least one UAV, and the first task is dispatched to the UAV for processing, and the second task is dispatched to the base station for processing, avoiding correlation
  • the tasks are all executed locally on the mobile device, or they are all dispatched to the UAV or the base station for remote execution, which leads to the problem of low efficiency of scheduling optimization.
  • step S410 the scheduling optimization method provided by the embodiment of the present application may also include the step of obtaining a scheduling optimization model. Referring to FIG. 11, this step may include the following sub-steps:
  • Step S440 establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system.
  • step S450 the initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
  • step S440 it should be noted that the specific ways of establishing the initial model and optimizing the objective function are not limited, and can be set according to actual application requirements.
  • step S440 may include the following sub-steps:
  • An initial model is established according to the initial parameters of at least one base station, unmanned aerial vehicle and mobile device; an optimization objective function is established according to the initial model.
  • the initial model may include the system model, calculation model and communication model of the mobile edge computing network system, and the step of establishing the initial model may include the following sub-steps:
  • the network architecture of the system model established in this application is mainly divided into three layers, mobile devices on the ground, drones in the air, and remote base stations.
  • the positions of the three can be represented by a three-dimensional Cartesian coordinate system.
  • the total execution time of the task to be processed is recorded as T, which is evenly divided into N time slices, and the time slice set can be expressed as:
  • this network system assumes that mobile devices cannot directly communicate with the base station, and can only offload tasks to the base station with the help of drones.
  • a collection of mobile devices can be expressed as:
  • M represents the number of mobile devices
  • the position of mobile device MD m in the time slice TS m can be expressed as:
  • each mobile device MD m will generate a computationally intensive task, which can be expressed as:
  • T req indicates the current task The maximum time allowed for execution. Without loss of generality, the maximum allowed execution time is the same for all tasks. In addition, the value of T req is smaller than ⁇ to ensure that each task can be executed in one time slice.
  • An onboard CPU is embedded in each mobile device MD m , and its maximum computing frequency can be used express.
  • the set of drones can be expressed as:
  • U represents the number of UAVs
  • the position of UAV u in time slice TS n can be expressed as:
  • H represents the height of the drone.
  • v u (n) denotes the velocity of UAV u in time slice TS n .
  • d min the minimum allowable distance
  • the energy consumption of UAV u in time slice TS n can be expressed as:
  • M g denotes the weight of the UAV u .
  • Each UAV can be deployed as an edge server, and its maximum computing power is recorded as In the time slice TS n , for the computing tasks that are determined to be uploaded to the UAV and executed, the CPU computing resources allocated by the UAV u can be expressed as and satisfy:
  • the location of the base station can be expressed as:
  • x BS and y BS represent the coordinates of the horizontal plane where the base station is located. Due to the high height of the base station and the drone, the base station and the drone are connected through a line-of-sight wireless transmission link and are not directly connected to the mobile device. In this case, the UAV acts as a relay forwarding device, forwarding the tasks offloaded by the mobile device to the base station for further calculation. Since the base station has a powerful computing server and energy supply, the execution time of the computing task at the base station is negligible, and the energy consumption of all tasks performed on the base station is not considered.
  • the offloading method of all computing tasks in this system follows the method of complete offloading, that is, each computing task is either completely executed locally, or completely offloaded to the UAV u , or further completely offloaded to the base station for execution.
  • Task Scheduling Decision Variables represent computing tasks Uninstallation of:
  • Computing tasks can be performed in mobile devices, drones, and base stations, so they can be called local computing, drone-side computing, and BS-side computing, respectively. if task Choose to compute locally, that is, Then, the calculation time of the task is:
  • the energy consumed is:
  • ⁇ m and v m are positive coefficients depending on the CPU in mobile device MD m .
  • the communication link of the entire network system is divided into two types: the communication link between the mobile device and the UAV, and the communication link between the UAV and the base station.
  • each UAV is assigned an orthogonal communication frequency. Due to the high altitude of UAVs, the wireless communication channel between UAVs and mobile devices or base stations is mainly based on Mainly line-of-sight wireless transmission.
  • the distance between the UAV u and the base station is:
  • the wireless channel gain between the mobile device MD m and the UAV u is:
  • the wireless channel gain between the UAV u and the base station is:
  • g o is the received power gain at the reference distance of 1 meter.
  • the transmission rate of mission data is:
  • B represents the bandwidth of the network system, and Respectively represent the wireless transmission power of the mobile device MD m and the UAV u in the time slice TS n , ⁇ 2 represents the communication noise frequency, and The following conditions are met respectively:
  • the time and energy consumed by the mobile device MD m to offload computing tasks to the UAV u are:
  • the time and energy consumed by the unmanned aerial vehicle UAV u to offload computing tasks to the base station are:
  • the optimization goal of this network system is to minimize the total energy consumption of mobile devices and UAVs under task delay constraints and system constraints (such as the maximum speed of UAVs, the minimum distance between UAVs and the maximum computing power).
  • task delay constraints and system constraints such as the maximum speed of UAVs, the minimum distance between UAVs and the maximum computing power.
  • the calculation task can be uniformly expressed as:
  • the mobile device MD m is performing the computing task
  • the energy consumed can be uniformly expressed as:
  • the energy consumption of all mobile devices during task execution can be expressed as:
  • optimization objective function is defined as follows:
  • constraint C1 states that the maximum speed of the drones and the minimum distance between drones should not violate the corresponding constraints.
  • Restriction C2 guarantees that the computing tasks generated by a certain mobile device in each time slice can only be executed on one of the local mobile device, UAV or base station, and each UAV can be executed in each time slice At most one task can be sent to the base station.
  • Constraint C3 ensures that the computing resources allocated to local computing and UAV computing in each time slice should not exceed the maximum computing capabilities of mobile devices and UAVs respectively.
  • Constraint C4 states that mobile devices and drones should not exceed their corresponding energy budgets during execution.
  • Constraint C5 states that the transmit power allocated by mobile devices and drones cannot exceed the maximum allowable value.
  • Constraint C6 ensures that the execution of each task should meet the delay requirement.
  • step S450 it should be noted that the specific manner of training the model is not limited, and can be set according to actual application requirements.
  • the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and step S450 may include the following sub-steps:
  • the optimization objective function is split and processed to obtain the first optimization objective function, the second optimization objective function and the third optimization objective function; the initial model is trained according to the first optimization objective function, and the UAV trajectory planning model is obtained.
  • the second optimization objective function trains the initial model to obtain a joint scheduling model of computing tasks, and trains the initial model according to the third optimization objective function to obtain a resource allocation model.
  • the problem P is a difficult problem, the main reasons are as follows: 1) Since A is a discrete binary variable, and L, P, F are continuous variables, the problem is a mixed nonlinear integer
  • the planning problem is an NP-hard problem; 2) Due to the fast response requirements of the network system, each time slice scheduling optimization algorithm should make real-time and fast scheduling decisions; 3) Since the positions of mobile devices and UAVs will change, P needs Can be solved in a dynamically changing environment.
  • this application decomposes the optimization objective function P into three sub-problems, including UAV trajectory planning (P1, the first optimization objective function), joint scheduling of computing tasks (P2, the second optimization objective function) and calculation And transmission resource allocation (P3, the third optimization objective function), so that an efficient mobile edge computing network scheduling strategy can be obtained, which greatly reduces the complexity of solving the optimization problem.
  • UAV trajectory planning P1, the first optimization objective function
  • joint scheduling of computing tasks P2, the second optimization objective function
  • calculation And transmission resource allocation P3, the third optimization objective function
  • the trajectory position L of the UAV is weakly dependent on the other three variables.
  • the optimization of this variable is mainly based on the position observation of the mobile device.
  • the optimized The goal is to be as close as possible to the mobile device and the base station, therefore, the UAV trajectory optimization can be expressed as:
  • the task offloading decision variable A needs to be optimized before optimizing the variables P and F. Based on the current mobile device cluster To minimize the maximum computational delay of all tasks Optimizing A for the goal makes it easier to satisfy the constraint C6 in the original problem P, so the joint scheduling subproblem of computing tasks can be expressed as:
  • This algorithm framework consists of a UAV trajectory planning model (UAV Trajectory Planning, UTP), a computing task joint scheduling model (Task Association Scheduling, TAS) and The calculation and transmission resource allocation model (Resource Allocation, RA) consists of three models, which correspond to the optimization sub-problems P1, P2 and P3 respectively.
  • UTP UAV Trajectory Planning
  • TAS computing task joint scheduling model
  • RA Resource Allocation
  • UTP model pairs For processing, since the location of the mobile device is different in different time slices, the UTP model will predict the movement of the mobile device and guide the UAV to move to an appropriate position. Since the motion mode of the mobile device neither conforms to the Gaussian distribution nor the linear distribution, this application can use the long short-term memory network to simulate the motion distribution of the mobile device. After the prediction is completed, the drones need to be properly divided into U clusters according to the number of drones, so that each drone can serve the mobile devices in the cluster. For soft clustering, i.e.
  • each mobile device can be served by different drones in different time slices (but not more than one drone in the same time slice), the fuzzy C-means is adopted in the UTP model
  • the clustering method performs clustering according to the similarity of channel power gains. After clustering, the center point of each cluster is used as the output of the movement position of the UAV in the UTP module, namely
  • the TAS model is received from the UTP model and the network environment respectively and The TAS model generates task scheduling decision variables according to time-varying channel conditions and computing task requirements value.
  • This application can use the advanced deep reinforcement learning (DRL) method: deep deterministic policy gradient algorithm (Deep Deterministic Policy Gradient, DDPG), according to the interaction between the algorithm model and the environment to obtain experience and output the optimized decision-making action a n .
  • DRL deep deterministic policy gradient algorithm
  • DDPG deep deterministic Policy Gradient
  • other reinforcement learning algorithms suitable for continuous actions can also be used.
  • the output action a n is a one-dimensional vector given by items, each of which is set to be a continuous variable that is relaxed between 0 and 1.
  • Each term of a n can be viewed as Compute the probability of execution on device k (this is why each item is set to a continuous value between 0 and 1). Since the task scheduling decision variables should be two-dimensional, binary values, the values of all items of a n are shaped and integrated as 1 or 0 according to the task association constraints of the optimization problem, and are used as the output of the TAS model, namely
  • the environment receives the action output by the above three models, and the environment receives the action and generates a reward r n (as the input of DDPG) and a new state (the components corresponding to the state are sent to the corresponding components of the algorithm framework). Thereafter, the algorithm enters the next time slice and repeats the above three steps.
  • the optimal location plan of the UAV can be calculated by the method of long-term short-term memory network and fuzzy C-means clustering.
  • the trajectory planning of UAV can be divided into mobile device motion prediction and mobile device clustering two parts.
  • the distance between the UAV and the mobile device is the main factor affecting other scheduling variables, so the ideal trajectory of the UAV is to gradually move towards the mobile device and get as close as possible to the mobile device.
  • the algorithm proposed in this application predicts the location of the mobile device To assist the movement of the drone. Due to The prediction of is mainly based on the position of the mobile device in the previous time slice, so this application uses the recurrent neural network LSTM to simulate time series distribution.
  • the Long-Short Term Memory is a recurrent neural network that accepts external input and feedback inputs (C n-1 and h n-1 ).
  • the output of LSTM includes two items (C n and hn ), which are input to LSTM itself for processing in the next time slice.
  • Cn is obtained by:
  • ⁇ and tanh represent the sigmoid and hyperbolic tangent activation functions respectively
  • W f , W i and W C represent the network weights of the corresponding neural network layer
  • b f , b i and b C represent the corresponding neural network Offset vector
  • h n is calculated by the following formula:
  • W O and b o are the parameters that the neural network needs to learn.
  • this application proposes an LSTM-based mobile device location observation model to predict the location of the mobile device, and its time series expansion is shown in FIG. 14 .
  • the current location of the mobile device is input to the LSTM network, and the LSTM outputs h n .
  • a fully connected layer is also added to the output to fine-tune h n as follows:
  • relu is the relu activation function, and The variables that need to be learned for the training of the neural network.
  • the FCM method can be used to start from the fuzzy theory, for each cluster
  • the mobile device MD m assigns a metric value d m,u in the time slice TS n+1 , and its calculation method is as follows:
  • c u represents the position of the UAV in the nth time slice
  • c k represents the center point of the kth cluster, namely
  • each c u Before iterating, all c u should be initialized, each c u using , because mobile devices can only move within a small range, their new center points may be close to the previous center points (these center points are planned as the position of the drone movement ).
  • each mobile device MD m is assigned a metric d m,u representing its membership in the u-th cluster, d m,u can be further adjusted to a binary clustering decision by an exploration strategy , which can reduce the possibility of getting stuck in a local minimum of the optimization objective O.
  • ⁇ c to denote the exploration threshold
  • mobile devices MD m are clustered with probability 1 ⁇ ⁇ c to the cluster with the largest metric value, and to other clusters with probability ⁇ c .
  • the algorithm in Figure 15 describes in detail the clustering process of mobile devices based on FCM in the nth time slice.
  • the output c u of Algorithm 1 guides the UAV to move to
  • the task scheduling decision variables of each mobile device can be obtained by using the deep deterministic policy gradient algorithm based on reinforcement learning.
  • the joint scheduling of computing tasks includes two parts: DDPG-based task scheduling decision variable optimization and scheduling variable integration. .
  • the algorithm framework uses the reinforcement learning algorithm of DDPG to learn the scheduling strategy of computing tasks, namely:
  • Policy ⁇ is a mapping function from environment state to decision-making action, and the state of the network environment is:
  • Each component of a n is a continuous variable from 0 to 1, whose magnitude is:
  • s n +1 is the new state of the environment after decision-making action a n is taken in state s n
  • is Discount factor for future rewards.
  • actor neural network (Actor) ⁇ parameter is ⁇
  • critic neural network (Critic) Q parameter is ⁇
  • the target network can be used (the target policy network body and the target evaluation network are respectively represented by and as parameters) to update the parameters periodically.
  • time slice TS n the environment transitions from state s n to state s n+1 after accepting the action a n output by the algorithm model, and generates a reward r n , packing these four items into a tuple ( s n , a n , s n+1 , r n ) and stored in an experience playback pool.
  • a batch of samples is randomly selected from the experience playback pool, and the evaluation neural network (ie, parameter ⁇ ) is trained according to the following loss function.
  • the actor network minimizes the following gradient function for parameter training:
  • a n output by the actor network is a one-dimensional vector, and each item of a n is a continuous value ranging from 0 to 1, it is necessary to reshape a n in a two-dimensional manner (reshape ), and integrated into 0 or 1 for further task scheduling.
  • the shaping and integration algorithm of a n it is the shaping and integration algorithm of a n , and the time complexity of the algorithm is After the shaping and integration of the above task scheduling variables, the output a[m][k] of Algorithm 3 is passed to the RA module for resource optimization allocation.
  • a method based on convex optimization can be used to determine the allocation of computing and transmission resources in the network system. and As input to the RA module for final processing. According to the sub-problem P3, the optimization variables P and F can be directly solved by using the convex optimization method through the CVXPY tool.
  • step S420 it should be noted that the specific manner of obtaining the scheduling policy is not limited, and can be set according to actual application requirements.
  • step S420 may include the following sub-steps:
  • the pending tasks and task scheduling decision variables are input into the resource allocation model, and the scheduling strategy is calculated.
  • this step may include the following sub-steps:
  • steps of performing prediction processing and clustering processing can refer to the process of obtaining the UAV trajectory planning model through training above.
  • this step may include the following sub-steps:
  • the task joint scheduling training process is performed according to the pending tasks and the predicted position information, and the decision-making action of at least one mobile device is obtained; the decision-making action is integrated, and the task scheduling decision variable is obtained.
  • steps of performing training processing and integration processing can refer to the above-mentioned process of training and obtaining a joint scheduling model for computing tasks.
  • this application deploys a mobile edge computing network consisting of a single base station, multiple UAVs and a large number of mobile devices. Each computing task can be executed on the mobile device or offloaded to the UAV. Computing, or further offloading to the base station through the drone as a repeater for more intensive computing.
  • the joint optimization problem of UAV trajectory, task association, computing and transmission resource allocation is determined.
  • this application decomposes the optimization problem into three sub-problems, which greatly reduces the energy consumption of the overall network system, prolongs the life of the network, and also reduces the calculation delay of all mobile devices in the communication network, improving Quality of service for computing-intensive applications.
  • FIG. 19 is a schematic block diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 in this embodiment may be a server capable of data interaction and processing, a processing device, a processing platform, and the like.
  • the electronic device 100 includes a first memory 110 , a first processor 120 and a communication module 130 .
  • the components of the first memory 110 , the first processor 120 and the communication module 130 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.
  • the first memory 110 is used to store programs or data.
  • the first memory 110 can be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), can Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
  • the first processor 120 is used for reading/writing data or programs stored in the first memory 110 and performing corresponding functions.
  • the communication module 130 is used to establish a communication connection between the electronic device 100 and other communication terminals through the network, and is used to send and receive data through the network.
  • FIG. 19 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than those shown in FIG. 19 , or have a configuration different from that shown in FIG. 19 .
  • Each component shown in FIG. 19 may be implemented using hardware, software, or a combination thereof.
  • the embodiment of the present application further provides a task offloading device 400 , and the functions implemented by the task offloading device 400 correspond to the steps performed by the above task offloading method.
  • the task offloading apparatus 400 can be understood as a processor of the above-mentioned electronic device 100 , and can also be understood as a component independent of the above-mentioned electronic device 100 or the processor that implements the functions of the present application under the control of the electronic device 100 .
  • the task offloading apparatus 400 may include a task acquiring module 410 , an offloading policy acquiring module 420 and an offloading policy sending module 430 .
  • the task acquisition module 410 may be configured to acquire at least one pending task of the first device 210, wherein the pending task includes a target task.
  • the task acquisition module 410 may be used to execute step S310 shown in FIG. 3 , and for relevant content of the task acquisition module 410 , please refer to the foregoing description of step S310 .
  • the offloading strategy acquisition module 420 may be configured to input the tasks to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model.
  • the uninstallation policy acquisition module 420 may be used to execute step S320 shown in FIG. 3 , and for related content of the uninstallation policy acquisition module 420 , please refer to the foregoing description of step S320 .
  • the offloading policy sending module 430 may be configured to send the task offloading policy to at least one first device 210, so that the at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 Execute the target task.
  • the uninstallation policy sending module 430 may be used to execute step S330 shown in FIG. 3 , and for related content of the uninstallation policy sending module 430 , refer to the foregoing description of step S330 .
  • some other implementations of the embodiments of the present application further provide a scheduling optimization device 500 .
  • the scheduling optimization apparatus described in some other implementation manners of the embodiments of the present application may be implemented as the task offloading apparatus described in some implementation manners of the present application.
  • the functions implemented by the scheduling optimization apparatus 500 correspond to the steps performed by the above scheduling optimization method.
  • the task acquisition module 510 may be configured to acquire pending tasks and current location information of at least one mobile device, wherein the pending tasks include the first task and the second task.
  • the task acquisition module 510 may be used to execute step S410 shown in FIG. 10 , and for relevant content of the task acquisition module 510 , please refer to the foregoing description of step S410 .
  • the scheduling strategy acquisition module 520 may be configured to input the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model.
  • the dispatching policy acquisition module 520 may be used to execute step S420 shown in FIG. 10 , and for relevant content of the dispatching policy acquisition module 520 , refer to the foregoing description of step S420 .
  • the scheduling strategy sending module 530 may be configured to send the scheduling strategy to at least one mobile device, so that the at least one mobile device sends the first task to at least one drone for processing based on the scheduling strategy, and sends the second task to at least one UAV for processing. A drone forwards to at least one base station for processing.
  • the dispatching policy sending module 530 may be used to execute step S430 shown in FIG. 10 , and for relevant content of the dispatching policy sending module 530 , refer to the foregoing description of step S430 .
  • an embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the above-mentioned task offloading method and/or the above-mentioned scheduling optimization method are executed. .
  • the computer program product of the task offloading method provided in the embodiment of the present application includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the task offloading method in the above method embodiment and/or the above scheduling optimization
  • the steps of the method reference may be made to the foregoing method embodiments for details, and details are not repeated here.
  • the task offloading method and device, electronic device, and storage medium obtained by some embodiments of the present application obtain a task offloading strategy by inputting tasks to be processed into a task offloading model, and send the task offloading strategy to the first device to Make the first device offload the target task to the second device for processing based on the task offloading strategy, realize the offloading of the target task to the server for processing, and avoid the tasks in the related art that are either all performed locally on the wireless user equipment, or all offloaded on the server.
  • the scheduling strategy is obtained by inputting the pending tasks and current location information into the preset scheduling optimization model, and sending the scheduling strategy to at least one mobile device so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station for processing through at least one UAV, realizing the scheduling of the first task to The processing is carried out on the UAV, and the second task is dispatched to the base station for processing, which avoids the efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. low problem.
  • the application provides a task offloading method and device, electronic equipment and a storage medium, and relates to the technical field of task offloading.
  • the task offloading method is applied to an electronic device, and the electronic device is connected in communication with a task offloading system.
  • the task offloading system includes a second device and at least one first device.
  • the task offloading method includes: firstly, obtaining a pending task of at least one first device; secondly , input the task to be processed into the preset task offloading model to obtain the task offloading strategy; then, send the task offloading strategy to at least one first device, so that at least one first device offloads the target task to the second device based on the task offloading strategy device, the second device executes the target task.
  • the embodiments of the present application also provide a scheduling optimization method and device, electronic equipment, and a storage medium.
  • the task offloading method, scheduling optimization method and device, electronic equipment and storage medium of the present application are reproducible and can be used in various industrial applications.
  • the task offloading method, scheduling optimization method and device, electronic device, and storage medium of the present application may be used in the technical field of task offloading and scheduling optimization.

Abstract

A task offloading method and apparatus, an electronic device, and a storage medium. The task offloading method comprises: S310, acquiring a task to be processed of at least one first device; S320, inputting said task into a preset task offloading model, so as to obtain a task offloading strategy; and S330, sending the task offloading strategy to the at least one first device, such that the at least one first device offloads a target task to a second device on the basis of the task offloading strategy, and the second device then processes the target task.

Description

任务卸载方法、调度优化方法和装置、电子设备及存储介质Task offloading method, scheduling optimization method and device, electronic device and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年05月18日提交中国国家知识产权局的申请号为202110537588.4、名称为“任务卸载方法和装置、电子设备及存储介质”的中国专利申请的优先权,以及于2021年07月07日提交中国国家知识产权局的申请号为202110765005.3、名称为“调度优化方法和装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202110537588.4 titled "task offloading method and device, electronic equipment and storage medium" submitted to the State Intellectual Property Office of China on May 18, 2021, and filed on July 2021 The priority of the Chinese patent application with application number 202110765005.3 entitled "Scheduling optimization method and device, electronic equipment and storage medium" filed with the State Intellectual Property Office of the People's Republic of China on March 07, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及任务卸载和调度优化技术领域,具体而言,涉及一种任务卸载方法、调度优化方法和装置、电子设备及存储介质。The present application relates to the technical field of task offloading and scheduling optimization, and in particular, to a task offloading method, a scheduling optimization method and device, electronic equipment, and a storage medium.
背景技术Background technique
在移动边缘计算网络中需要解决的关键问题之一是计算卸载问题,即无线用户设备选择将计算任务卸载到附近的服务器上执行还是在本地执行,以及如何为卸载到服务器的任务分配资源(如计算资源和能源资源)。One of the key issues to be solved in the mobile edge computing network is the computing offload problem, that is, the wireless user equipment chooses to offload computing tasks to a nearby server or perform locally, and how to allocate resources for the tasks offloaded to the server (such as computing resources and energy resources).
但是,经发明人研究发现,在相关技术中,任务要么全部在无线用户设备本地执行,要么全部卸载在服务器上远程执行,从而存在着任务卸载的效率低的问题。However, the inventors have found through research that in related technologies, all tasks are either executed locally on the wireless user equipment, or all tasks are offloaded and executed remotely on the server, thus there is a problem of low efficiency of task offloading.
此外,当网络基础设施不可用(如发生自然灾害的救援现场)、网络设备稀疏分布(如野外作业环境)或面对临时激增的移动设备并远远超出网络服务能力时(如大型比赛或集会),可以采用无人机作为通信中继站或边缘计算平台。在无人机辅助的移动边缘计算领域,需要对无人机在移动边缘计算网络中计算任务调度情况(计算任务是在移动设备本地执行,还是调度到无人机或基站进行执行)进行恰当的决策以获得理想的性能。In addition, when the network infrastructure is unavailable (such as a natural disaster rescue scene), network equipment is sparsely distributed (such as a field operation environment), or when faced with a temporary surge of mobile devices that far exceeds the network service capacity (such as a large game or assembly ), UAVs can be used as communication relay stations or edge computing platforms. In the field of UAV-assisted mobile edge computing, it is necessary to properly determine the scheduling of UAV computing tasks in the mobile edge computing network (whether the computing task is executed locally on the mobile device or dispatched to the UAV or base station). decision to obtain desired performance.
但是,经发明人研究发现,在相关技术中,任务要么全部在移动设备本地执行,要么全部调度到无人机或基站上远程执行,从而存在着调度优化的效率低的问题。However, the inventors have found through research that in related technologies, tasks are either all executed locally on mobile devices, or all tasks are dispatched to drones or base stations for remote execution, so there is a problem of low scheduling optimization efficiency.
发明内容Contents of the invention
有鉴于此,本申请的一个方面提供了一种任务卸载方法和装置、电子设备及存储介质,以改善相关技术中存在的问题。In view of this, one aspect of the present application provides a task offloading method and device, electronic equipment, and a storage medium, so as to improve the problems existing in related technologies.
为实现上述目的,本申请实施例采用如下技术方案:In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
本申请的一个方面提供了一种任务卸载方法,所述任务卸载方法应用于电子设备,所述电子设备与任务卸载系统通信连接,所述任务卸载系统包括第二设备和至少一个第一设备,所述任务卸载方法可以包括:One aspect of the present application provides a task offloading method, the task offloading method is applied to an electronic device, the electronic device is communicatively connected to a task offloading system, and the task offloading system includes a second device and at least one first device, The task offloading method may include:
获取所述至少一个第一设备的待处理任务,其中,所述待处理任务包括目标任务;Acquiring pending tasks of the at least one first device, wherein the pending tasks include target tasks;
将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略,其中,所述任务卸载模型基于建立的系统模型进行训练得到;Inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;
将所述任务卸载策略发送至所述至少一个第一设备,以使所述至少一个第一设备基于所述任务卸载策略将所述目标任务卸载至所述第二设备,所述第二设备对所述目标任务进行执行处理。sending the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the second device based on the task offloading policy, and the second device is responsible for The target task performs execution processing.
在可选的实施方式中,所述任务卸载方法还可以包括获取任务卸载模型的步骤,该步骤可以包括:In an optional implementation manner, the task offloading method may also include the step of obtaining a task offloading model, which may include:
根据所述任务卸载系统的成本参数建立系统模型和优化成本函数;Establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system;
根据所述优化成本函数对所述系统模型进行训练,得到任务卸载模型。The system model is trained according to the optimized cost function to obtain a task offloading model.
在可选的实施方式中,所述根据所述任务卸载系统的成本参数建立系统模型和优化成本函数的步骤,可以包括:In an optional embodiment, the step of establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system may include:
根据所述至少一个第一设备和第二设备的成本参数建立系统模型;building a system model based on cost parameters of said at least one first device and second device;
根据所述系统模型建立优化成本函数。An optimization cost function is established based on the system model.
在可选的实施方式中,所述任务卸载模型包括第一任务卸载模型和第二任务卸载模型,所述根据所述优化成本函数对所述系统模型进行训练,得到任务卸载模型的步骤,可以包括:In an optional embodiment, the task offloading model includes a first task offloading model and a second task offloading model, and the step of training the system model according to the optimization cost function to obtain the task offloading model may be include:
对所述优化成本函数进行分割处理,得到第一优化成本函数和第二优化成本函数;performing segmentation processing on the optimized cost function to obtain a first optimized cost function and a second optimized cost function;
根据所述第一优化成本函数对所述系统模型进行训练,得到第一任务卸载模型;Train the system model according to the first optimization cost function to obtain a first task offloading model;
根据所述第二优化成本函数对所述系统模型进行训练,得到第二任务卸载模型。The system model is trained according to the second optimization cost function to obtain a second task offloading model.
在可选的实施方式中,所述任务卸载策略包括第一任务卸载策略和第二任务卸载策略,所述将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略的步骤,可以包括:In an optional embodiment, the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, and the step of inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy may be include:
将所述待处理任务输入所述第一任务卸载模型,得到第一任务卸载策略;inputting the pending task into the first task offloading model to obtain a first task offloading strategy;
将所述待处理任务输入所述第二任务卸载模型,得到第二任务卸载策略。The task to be processed is input into the second task offloading model to obtain a second task offloading policy.
在可选的实施方式中,所述根据所述第一优化成本函数对所述系统模型进行训练,得到第一任务卸载模型的步骤,可以包括:In an optional implementation manner, the step of training the system model according to the first optimization cost function to obtain a first task offloading model may include:
基于所述系统模型建立深度强化学习模型;Establishing a deep reinforcement learning model based on the system model;
根据所述第一优化成本函数对所述深度强化学习模型进行训练,得到第一任务卸载模型。The deep reinforcement learning model is trained according to the first optimized cost function to obtain a first task offloading model.
在可选的实施方式中,所述根据所述第二优化成本函数对所述系统模型进行训练,得到第二任务卸载模型的步骤,可以包括:In an optional implementation manner, the step of training the system model according to the second optimization cost function to obtain a second task offloading model may include:
基于所述系统模型建立交替方向乘子法模型;Establishing an alternating direction multiplier method model based on the system model;
根据所述第二优化成本函数对所述交替方向乘子法模型进行训练,得到第二任务卸载模型。The alternating direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
本申请还提供一种任务卸载装置,所述任务卸载装置应用于电子设备,所述电子设备与任务卸载系统通信连接,所述任务卸载系统包括第二设备和至少一个第一设备,所述任务卸载装置可以包括:The present application also provides a task offloading device, the task offloading device is applied to electronic equipment, and the electronic equipment is connected in communication with a task offloading system, the task offloading system includes a second device and at least one first device, the task Unloading devices can include:
任务获取模块,被配置成用于获取所述至少一个第一设备的待处理任务,其中,所述待处理任务包括目标任务;A task acquisition module configured to acquire pending tasks of the at least one first device, wherein the pending tasks include target tasks;
策略获取模块,被配置成用于将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略,其中,所述任务卸载模型基于建立的系统模型进行训练得到;A strategy acquisition module configured to input the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;
策略发送模块,被配置成用于将所述任务卸载策略发送至所述至少一个第一设备,以使所述至少一个第一设备基于所述任务卸载策略将所述目标任务卸载至所述第二设备,所述第二设备对所述目标任务进行执行处理。A policy sending module configured to send the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the first device based on the task offloading policy. A second device, where the second device executes the target task.
本申请提供一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现前述实施方式任一项所述的任务卸载方法。The present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the tasks described in any one of the preceding embodiments are realized. Uninstall method.
本申请提供一种存储介质,所述存储介质包括计算机程序,所述计算机程序运行时控制所述存储介质所在电子设备执行前述实施方式任一项所述的任务卸载方法。The present application provides a storage medium, the storage medium includes a computer program, and when the computer program runs, the computer program controls the electronic device where the storage medium is located to execute the task offloading method described in any one of the foregoing implementation manners.
本申请实施例提供的任务卸载方法和装置、电子设备及存储介质,通过将待处理任务输入任务卸载模型得到任务卸载策略,将任务卸载策略发送至第一设备,以使第一设备基于任务卸载策略将目标任务卸载至第二设备进行处理,实现了将目标任务卸载到服务器上进行处理,避免了相关技术中任务要么全部在无线用户设备本地执行,要么全部卸载在服务器上远程执行,所导致的任务卸载的效率低的问题。In the task offloading method and device, electronic device, and storage medium provided in the embodiments of the present application, the task offloading strategy is obtained by inputting the task to be processed into the task offloading model, and the task offloading strategy is sent to the first device, so that the first device can offload the task based on the task. The strategy offloads the target task to the second device for processing, realizes the offloading of the target task to the server for processing, and avoids the problems in related technologies that either all tasks are performed locally on the wireless user equipment, or all tasks are offloaded and performed remotely on the server. The problem of low efficiency of task offloading.
本申请的另一方面还提供了一种调度优化方法和装置、电子设备及存储介质,以改善相关技术中存在的问题。Another aspect of the present application also provides a method and device for scheduling optimization, electronic equipment, and storage media, so as to improve the problems existing in related technologies.
为实现上述目的,本申请实施例采用如下技术方案:In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
本申请的另一方面还提供了一种调度优化方法,所述调度优化方法应用于电子设备,该电子设备与移动边缘计算网络系统通信连接,所述移动边缘计算网络系统包括至少一个基站、无人机和移动设备,所述调度优化方法可以包括:Another aspect of the present application also provides a scheduling optimization method. The scheduling optimization method is applied to electronic equipment, and the electronic equipment is connected to a mobile edge computing network system in communication. The mobile edge computing network system includes at least one base station, wireless For man-machine and mobile devices, the scheduling optimization method may include:
获取所述至少一个移动设备的待处理任务和当前位置信息,其中,所述待处理任务包括第一任务和第二任务;Obtain pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;
将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略,其中,所述调度优化模型基于建立的初始模型进行训练得到;Inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;
将所述调度策略发送至所述至少一个移动设备,以使所述至少一个移动设备基于所述调度策略将所述第一任务发送至所述至少一个无人机进行处理,将所述第二任务通过所述至少一个无人机转发至所述至少一个基站进行处理。sending the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one drone for processing based on the scheduling strategy, and the second The task is forwarded by the at least one drone to the at least one base station for processing.
在可选的实施方式中,所述调度优化方法可以采用根据本申请的实施方式所述的任务卸载方法来实现。In an optional implementation manner, the scheduling optimization method may be implemented by using the task offloading method according to the implementation manners of the present application.
在可选的实施方式中,所述调度优化方法还包括获取调度优化模型的步骤,该步骤可以包括:In an optional implementation manner, the scheduling optimization method further includes the step of obtaining a scheduling optimization model, which may include:
根据所述移动边缘计算网络系统的初始参数建立初始模型和优化目标函数;Establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system;
根据所述优化目标函数对所述初始模型进行训练,得到调度优化模型。The initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
在可选的实施方式中,所述根据所述移动边缘计算网络系统的初始参数建立初始模型和优化目标函数的步骤,可以包括:In an optional embodiment, the step of establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system may include:
根据所述至少一个基站、无人机和移动设备的初始参数建立初始模型;establishing an initial model based on initial parameters of the at least one base station, UAV, and mobile device;
根据所述初始模型建立优化目标函数。An optimization objective function is established according to the initial model.
在可选的实施方式中,所述调度优化模型包括无人机轨迹规划模型、计算任务联合调度模型和资源分配模型,所述根据所述优化目标函数对所述初始模型进行训练,得到调度优化模型的步骤,可以包括:In an optional embodiment, the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and the initial model is trained according to the optimization objective function to obtain scheduling optimization The steps of the model can include:
对所述优化目标函数进行拆分处理,得到第一优化目标函数、第二优化目标函数和第三优化目标函数;performing split processing on the optimization objective function to obtain a first optimization objective function, a second optimization objective function and a third optimization objective function;
根据所述第一优化目标函数对所述初始模型进行训练,得到所述无人机轨迹规划模型,根据所述第二优化目标函数对所述初始模型进行训练,得到所述计算任务联合调度模型,根据所述第三优化目标函数对所述初始模型进行训练,得到所述资源分配模型。The initial model is trained according to the first optimization objective function to obtain the UAV trajectory planning model, and the initial model is trained according to the second optimization objective function to obtain the computing task joint scheduling model , training the initial model according to the third optimization objective function to obtain the resource allocation model.
在可选的实施方式中,所述将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略的步骤,可以包括:In an optional implementation manner, the step of inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy may include:
将所述当前位置信息输入所述无人机轨迹规划模型,计算得到所述至少一个移动设备的预测位置信息;inputting the current location information into the UAV trajectory planning model, and calculating predicted location information of the at least one mobile device;
将所述待处理任务和预测位置信息输入所述任务联合调度模型,计算得到所述至少一个移动设备的任务调度决策变量;inputting the to-be-processed tasks and predicted location information into the task joint scheduling model, and calculating task scheduling decision variables of the at least one mobile device;
将所述待处理任务和任务调度决策变量输入所述资源分配模型,计算得到调度策略。Input the pending tasks and task scheduling decision variables into the resource allocation model to calculate a scheduling strategy.
在可选的实施方式中,所述将所述当前位置信息输入所述无人机轨迹规划模型,计算得到所述至少一个移动设备的预测位置信息的步骤,可以包括:In an optional embodiment, the step of inputting the current location information into the UAV trajectory planning model and calculating the predicted location information of the at least one mobile device may include:
根据所述当前位置信息进行运动预测处理,得到所述至少一个移动设备的下一位置信息;performing motion prediction processing according to the current location information to obtain the next location information of the at least one mobile device;
对所述至少一个移动设备的下一位置信息进行聚类处理,得到预测位置信息。Perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.
在可选的实施方式中,所述将所述待处理任务和预测位置信息输入所述任务联合调度模型,计算得到所述至少一个移动设备的任务调度决策变量的步骤,可以包括:In an optional implementation manner, the step of inputting the to-be-processed tasks and predicted location information into the task joint scheduling model, and calculating the task scheduling decision variables of the at least one mobile device may include:
根据所述待处理任务和预测位置信息进行任务联合调度训练处理,得到所述至少一个移动设备的决策动作;performing task joint scheduling training processing according to the pending task and predicted location information, to obtain the decision-making action of the at least one mobile device;
对所述决策动作进行集成处理,得到任务调度决策变量。The decision-making actions are integrated to obtain task scheduling decision variables.
本申请提供了一种调度优化装置,应用于电子设备,该电子设备与移动边缘计算网络系统通信连接,所述移动边缘计算网络系统包括至少一个基站、无人机和移动设备,所述调度优化装置包括:The present application provides a scheduling optimization device, which is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system. The mobile edge computing network system includes at least one base station, unmanned aerial vehicles, and mobile equipment. The scheduling optimization Devices include:
任务获取模块,可以被配置成用于:获取所述至少一个移动设备的待处理任务和当前位置信息,其中,所述待处理任务包括第一任务和第二任务;The task acquisition module may be configured to: acquire the pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;
策略获取模块,可以被配置成用于:将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略,其中,所述调度优化模型基于建立的初始模型进行训练得到;The strategy acquisition module may be configured to: input the task to be processed and the current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;
策略发送模块,可以被配置成用于:将所述调度策略发送至所述至少一个移动设备,以使所述至少一个移动设备基于所述调度策略将所述第一任务发送至所述至少一个无人机进行处理,将所述第二任务通过所述至少一个无人机转发至所述至少一个基站进行处理。A policy sending module, configured to: send the scheduling policy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one mobile device based on the scheduling policy The UAV performs processing, and forwards the second task to the at least one base station through the at least one UAV for processing.
在可选的实施方式中,所述调度优化装置实施为根据本申请的实施方式所述的任务卸载装置。In an optional implementation manner, the scheduling optimization device is implemented as the task offloading device according to the implementation manners of the present application.
本申请提供了一种电子设备,可以包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现前述实施方式任一项所述的任务卸载方法和/或调度优化方法。The present application provides an electronic device, which may include: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it implements any of the foregoing embodiments A task offloading method and/or a scheduling optimization method.
本申请提供了一种存储介质,所述存储介质可以包括计算机程序,所述计算机程序运行时控制所述存储介质所在电子设备执行前述实施方式任一项所述的任务卸载方法和/或调度优化方法。The present application provides a storage medium, the storage medium may include a computer program, and when the computer program is running, the electronic device where the storage medium is located is controlled to execute the task offloading method and/or scheduling optimization described in any one of the preceding embodiments method.
本申请实施例提供的调度优化方法和装置、电子设备及存储介质,通过将待处理任务和当前位置信息输入预设的调度优化模型得到调度策略,将调度策略发送至至少一个移动设备,以使至少一个移动设备基于调度策略将第一任务发送至至少一个无人机进行处理,将第二任务通过至少一个无人机转发至至少一个基站进行处理,实现了将第一任务调度到无人机上进行处理,将第二任务调度到基站进行处理,避免了相关技术中任务要么全部在移动设备本地执行,要么全部调度到无人机或基站上远程执行,所导致的调度优化的效率低的问题。In the scheduling optimization method and device, electronic equipment, and storage medium provided in the embodiments of the present application, the scheduling strategy is obtained by inputting the tasks to be processed and the current location information into the preset scheduling optimization model, and the scheduling strategy is sent to at least one mobile device, so that At least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station through at least one UAV for processing, realizing the scheduling of the first task to the UAV For processing, the second task is dispatched to the base station for processing, which avoids the problem of low efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. .
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, so It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.
图1示出了本申请的一些实施例提供的数据处理系统的结构框图。Fig. 1 shows a structural block diagram of a data processing system provided by some embodiments of the present application.
图2示出了本申请实施例提供的任务卸载系统的结构框图。FIG. 2 shows a structural block diagram of a task offloading system provided by an embodiment of the present application.
图3为本申请实施例提供的任务卸载方法的流程示意图。FIG. 3 is a schematic flowchart of a task offloading method provided by an embodiment of the present application.
图4为本申请实施例提供的任务卸载模型的结构示意图。FIG. 4 is a schematic structural diagram of a task offloading model provided by an embodiment of the present application.
图5为本申请实施例提供的深度强化学习模型的结构示意图。FIG. 5 is a schematic structural diagram of a deep reinforcement learning model provided by an embodiment of the present application.
图6为本申请实施例提供的ECRA算法的流程示意图。FIG. 6 is a schematic flowchart of the ECRA algorithm provided by the embodiment of the present application.
图7为本申请实施例提供的任务卸载方法的另一流程示意图。FIG. 7 is another schematic flow chart of the task offloading method provided by the embodiment of the present application.
图8示出了本申请的另一些实施例提供的数据处理系统的结构框图。Fig. 8 shows a structural block diagram of a data processing system provided by other embodiments of the present application.
图9示出了本申请实施例提供的调度优化系统的结构框图。FIG. 9 shows a structural block diagram of a scheduling optimization system provided by an embodiment of the present application.
图10为本申请实施例提供的调度优化方法的流程示意图。FIG. 10 is a schematic flowchart of a scheduling optimization method provided by an embodiment of the present application.
图11为本申请实施例提供的调度优化方法的另一流程示意图。FIG. 11 is another schematic flowchart of the scheduling optimization method provided by the embodiment of the present application.
图12为本申请实施例提供的调度优化模型的结构示意图。FIG. 12 is a schematic structural diagram of a scheduling optimization model provided by an embodiment of the present application.
图13为本申请实施例提供的LSTM网络的结构示意图。FIG. 13 is a schematic structural diagram of the LSTM network provided by the embodiment of the present application.
图14为本申请实施例提供的基于LSTM网络的移动设备位置预测模型的结构示意图。FIG. 14 is a schematic structural diagram of an LSTM network-based mobile device location prediction model provided by an embodiment of the present application.
图15为本申请实施例提供的基于FCM的移动设备聚类算法的流程示意图。FIG. 15 is a schematic flowchart of an FCM-based mobile device clustering algorithm provided by an embodiment of the present application.
图16为本申请实施例提供的演员神经网络和评价家神经网络的结构示意图。FIG. 16 is a schematic structural diagram of the actor neural network and the evaluator neural network provided by the embodiment of the present application.
图17为本申请实施例提供的基于DDPG的计算任务调度算法的流程示意图。FIG. 17 is a schematic flowchart of a DDPG-based computing task scheduling algorithm provided by an embodiment of the present application.
图18为本申请实施例提供的调度变量塑型整合算法的流程示意图。FIG. 18 is a schematic flowchart of a scheduling variable shaping integration algorithm provided by an embodiment of the present application.
图19示出了本申请实施例提供的电子设备的结构框图。FIG. 19 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
图20为本申请实施例提供的任务卸载装置的结构框图。FIG. 20 is a structural block diagram of a task offloading device provided by an embodiment of the present application.
图21为本申请实施例提供的调度优化装置的结构框图。FIG. 21 is a structural block diagram of a scheduling optimization device provided by an embodiment of the present application.
图标:10-数据处理系统;100-电子设备;110-第一存储器;120-第一处理器;130-通信模块;200-任务卸载系统;300-调度优化系统;210-第一设备;220-第二设备;400-任务卸载装置;410-任务获取模块;420-卸载策略获取模块;430-卸载策略发送模块;500-任务调度装置;510-任务获取模块;520-调度策略获取模块;530-调度策略发送模块。Icons: 10-data processing system; 100-electronic equipment; 110-first memory; 120-first processor; 130-communication module; 200-task offloading system; 300-scheduling optimization system; 210-first device; 220 -second device; 400-task unloading device; 410-task acquisition module; 420-unloading strategy acquisition module; 430-unloading strategy sending module; 500-task scheduling device; 510-task acquisition module; 520-scheduling strategy acquisition module; 530 —Scheduling policy sending module.
具体实施方式Detailed ways
随着无线通信技术的飞速发展和智能移动设备的普及,近年来,各类移动应用的数量呈爆发增长的趋势。其中如人脸识别支付系统、在线云游戏、虚拟/增强现实(VR/AR)这些应用属于计算密集型与延时关键型应用,而运行这些应用的移动设备(如智能手机、可穿戴设备)通常只有有限的计算能力和电池电量,因此计算密集型应用和资源有限的设备之间的矛盾关系为提高用户的体验质量(the quality of experience,QoE)带来了挑战。With the rapid development of wireless communication technology and the popularity of smart mobile devices, in recent years, the number of various mobile applications has shown an explosive growth trend. Among them, applications such as face recognition payment systems, online cloud games, and virtual/augmented reality (VR/AR) are computing-intensive and delay-critical applications, and mobile devices (such as smartphones, wearable devices) that run these applications Often only with limited computing power and battery power, the contradictory relationship between computing-intensive applications and resource-constrained devices poses a challenge to improve the quality of experience (QoE) for users.
移动边缘计算(Mobile edge computing,MEC)是一种很有前途的技术,可以通过在边缘计算网络中设置边缘服务器为用户的移动设备提供强大的计算能力和能源资源,移动设备可以选择将计算密集型任务卸载到边缘服务器上,以减少任务的执行时延并节省本地设备消耗的电池能量。同时,随着无线电力传输技术(wireless power transfer,WPT)的发展,无线用户设备的电池可以通过无线传输的方式持续进行充电,大大延长了电池供电时间,缓解了无线用户设备由于能量不足带来的限制。Mobile edge computing (Mobile edge computing, MEC) is a promising technology, which can provide powerful computing power and energy resources for users' mobile devices by setting edge servers in the edge computing network. Type tasks are offloaded to the edge server to reduce task execution delay and save battery energy consumed by local devices. At the same time, with the development of wireless power transfer technology (wireless power transfer, WPT), the battery of wireless user equipment can be continuously charged through wireless transmission, which greatly prolongs the battery power supply time and alleviates the problem of wireless user equipment due to insufficient energy. limits.
在移动边缘计算网络中需要解决的关键问题之一是计算卸载问题,即无线用户设备选择将计算任务卸载到附近的MEC服务器上执行还是在本地执行,以及如何为卸载到服务器的任务分配资源(如计算资源和能源资源)。一般情况下,无线网络由多个无线用户设备组成,无线用户设备的移动性导致的时变信道条件的动态变化使卸载调度过程变得复杂。一个好的计算卸载策略可以提高无线用户设备的整体计算能力,增强移动边缘计算系统的性能。因此,最近很多研究与发明都集中于设计高效的计算卸载和资源分配策略。One of the key issues to be solved in the mobile edge computing network is the computing offloading problem, that is, the wireless user equipment chooses to offload computing tasks to a nearby MEC server or execute locally, and how to allocate resources for the tasks offloaded to the server ( such as computing resources and energy resources). Generally, a wireless network consists of multiple wireless user equipments, and the dynamic change of time-varying channel conditions caused by the mobility of wireless user equipments complicates the offload scheduling process. A good computing offload strategy can improve the overall computing power of wireless user equipment and enhance the performance of mobile edge computing systems. Therefore, a lot of recent research and inventions have focused on designing efficient computation offloading and resource allocation strategies.
现有一些发明或研究提出了使用动态规划算法和分支定界法在移动边缘计算网络中进行计算任务卸载和资源分配,然而这些方法在求解优化变量时算法的计算复杂较大需要耗费大量的计算时间只适用于网络环境较简单的场景。基于启发式算法的卸载优化方法虽然可以减少计算复杂度,但这类方法通常需要大量的计算迭代才能达到满意的优化结果,这可能无法实际运用到动态的移动边缘计算网络中(即无线用户设备的移动引起的时变信道条件)进行在线计算卸载。Some existing inventions or researches propose the use of dynamic programming algorithms and branch-and-bound methods to offload computing tasks and allocate resources in mobile edge computing networks. However, these methods require a lot of computational complexity when solving optimization variables. Time is only applicable to scenarios with relatively simple network environments. Although offloading optimization methods based on heuristic algorithms can reduce computational complexity, such methods usually require a large number of computational iterations to achieve satisfactory optimization results, which may not be practically applied to dynamic mobile edge computing networks (i.e., wireless user equipment Time-varying channel conditions caused by mobile movement) for online computation offloading.
为了改善本申请所提出的上述至少一种技术问题,本申请实施例提供一种任务卸载方法和装置、电子设备及存储介质,下面通过可能的实现方式对本申请的技术方案进行说明。In order to improve at least one of the above-mentioned technical problems raised by the present application, embodiments of the present application provide a task offloading method and device, electronic equipment, and a storage medium. The technical solution of the present application will be described below through possible implementation modes.
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本申请实施例针对上述问题所提出的解决方案,都应该是发明人在发明过程中做出的贡献。The defects in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions proposed by the embodiments of the application for the above problems below should be The inventor's contribution to the invention process.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, but also includes none other elements specifically listed, or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。It should be noted that, in the case of no conflict, the features in the embodiments of the present application may be combined with each other.
图1为本申请的一些实施例提供的数据处理系统10的结构框图,其提供了一种数据处理系统10可能的实现方式,参见图1,该数据处理系统10可以包括电子设备100、任务卸载系统200中的一种或多种。Fig. 1 is a structural block diagram of a data processing system 10 provided by some embodiments of the present application, which provides a possible implementation of a data processing system 10, referring to Fig. 1, the data processing system 10 may include an electronic device 100, task offloading One or more of the system 200.
其中,电子设备100与任务卸载系统200通信连接,电子设备100获取任务卸载系统200的待处理任务,根据待处理任务得到任务卸载策略,以使任务卸载系统200根据任务卸载策略进行任务卸载处理。Wherein, the electronic device 100 communicates with the task offloading system 200, and the electronic device 100 obtains the tasks to be processed by the task offloading system 200, and obtains a task offloading strategy according to the pending tasks, so that the task offloading system 200 performs task offloading processing according to the task offloading strategy.
可选地,任务卸载系统200的具体组成不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,任务卸载系统200可以包括第二设备220和至少一个第一设备210。Optionally, the specific composition of the task offloading system 200 is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading system 200 may include a second device 220 and at least one first device 210 .
需要说明的是,在一种可以替代的示例中,电子设备100和第一设备210可以为同一设备;在另一种可以替代的示例中,电子设备100和第二设备220可以为同一设备。It should be noted that, in an alternative example, the electronic device 100 and the first device 210 may be the same device; in another alternative example, the electronic device 100 and the second device 220 may be the same device.
可选地,第一设备210和第二设备220的具体类型不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,第一设备210可以为无线用户设备,第二设备220可以为边缘计算服务器。Optionally, specific types of the first device 210 and the second device 220 are not limited, and may be set according to actual application requirements. For example, in an alternative example, the first device 210 may be a wireless user device, and the second device 220 may be an edge computing server.
结合图2,在大规模移动边缘计算网络中包含一个带有无线访问接入点(access point,AP)的边缘计算服务器和N个无线用户设备,其中N={1,2,...,N},每一个无线用户设备都能在一定范围内移动,无线访问接入点的能量足够稳定,可以通过无线射频向无线用户设备传输电力。每个无线用户设备都配备有无线传输天线,可以与无线访问接入点进行数据传输,也可以从无线访问接入点接收能量。从无线访问接入点接收到的能量存储在无线用户设备的可充电电池中。Combined with Figure 2, a large-scale mobile edge computing network includes an edge computing server with a wireless access point (access point, AP) and N wireless user equipment, where N={1,2,..., N}, each wireless user equipment can move within a certain range, and the energy of the wireless access point is stable enough to transmit power to the wireless user equipment through radio frequency. Each wireless user equipment is equipped with a wireless transmission antenna, which can perform data transmission with the wireless access point, and can also receive energy from the wireless access point. The energy received from the wireless access point is stored in the rechargeable battery of the wireless user device.
图3示出了本申请实施例所提供的任务卸载方法的流程图之一,该方法可应用于图19所示的电子设备100(在下文中描述),由图19中的电子设备100执行。应当理解,在其他实施例中,本实施例的任务卸载方法中的部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除。下面对图3所示的任务卸载方法的流程进行详细描述。FIG. 3 shows one of the flow charts of the task offloading method provided by the embodiment of the present application. The method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 . It should be understood that in other embodiments, the order of some steps in the task offloading method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted. The flow of the task offloading method shown in FIG. 3 will be described in detail below.
步骤S310,获取至少一个第一设备210的待处理任务。Step S310, acquiring at least one pending task of the first device 210.
其中,待处理任务包括目标任务。Wherein, the tasks to be processed include target tasks.
步骤S320,将待处理任务输入预设的任务卸载模型,得到任务卸载策略。Step S320, inputting the tasks to be processed into a preset task offloading model to obtain a task offloading strategy.
其中,任务卸载模型基于建立的系统模型进行训练得到。Wherein, the task offloading model is trained based on the established system model.
步骤S330,将任务卸载策略发送至至少一个第一设备210,以使至少一个第一设备210基于任务卸载策略将目标任务卸载至第二设备220,第二设备220对目标任务进行执行处理。Step S330, sending the task offloading policy to at least one first device 210, so that at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 executes the target task.
上述方法通过将待处理任务输入任务卸载模型得到任务卸载策略,将任务卸载策略发送至第一设备,以使第一设备基于任务卸载策略将目标任务卸载至第二设备进行处理,实现了将目标任务卸载到服务器上进行处理,避免了相关技术中任务要么全部在无线用户设备本地执行,要么全部卸载在服务器上远程执行,所导致的任务卸载的效率低的问题。The above method obtains the task offloading strategy by inputting the tasks to be processed into the task offloading model, and sends the task offloading strategy to the first device, so that the first device offloads the target task to the second device for processing based on the task offloading strategy. The tasks are offloaded to the server for processing, which avoids the problem of low efficiency of task offloading caused by the related technologies that all tasks are either executed locally on the wireless user equipment, or all offloaded and executed remotely on the server.
在步骤S310之前,本申请提供的任务卸载方法还可以包括获取任务卸载模型的步骤,该步骤可以包括:Before step S310, the task offloading method provided by the present application may also include the step of obtaining a task offloading model, which may include:
根据任务卸载系统200的成本参数建立系统模型和优化成本函数;根据优化成本函数对系统模型进行训练,得到任务卸载模型。Establish a system model and optimize a cost function according to the cost parameters of the task offloading system 200; train the system model according to the optimized cost function to obtain a task offloading model.
可选地,根据任务卸载系统200的成本参数建立系统模型和优化成本函数的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,可以包括以下子步骤:Optionally, the specific ways of establishing the system model and optimizing the cost function according to the cost parameters of the task offloading system 200 are not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:
根据至少一个第一设备210和第二设备220的成本参数建立系统模型;根据系统模型建立优化成本函数。Establishing a system model according to the cost parameters of at least one first device 210 and the second device 220; establishing an optimization cost function according to the system model.
详细地,首先建立系统模型,整个系统时间可以被划分为多个恒定的时间片,表示为t∈{1,2,...},每个时间片的长度都为T秒,且假设当每个无线用户设备在时间片t产生一个计算密集型任务时,这些任务的执行时间不会超过一个时间的长度。部署无线访问接入点的MEC服务器的计算能力比无线用户设备强大得多,因此每个无线用户设备可以选择通过计算卸载的方式在服务器上远程执行任务,也可以选择在本地执行任务。In detail, first establish a system model, the entire system time can be divided into multiple constant time slices, denoted as t∈{1,2,...}, each time slice is T seconds long, and assuming that when When each wireless user equipment generates a calculation-intensive task in the time slice t, the execution time of these tasks will not exceed a length of time. The computing power of the MEC server where the wireless access point is deployed is much stronger than that of the wireless user equipment. Therefore, each wireless user equipment can choose to perform tasks remotely on the server by offloading calculations, or choose to perform tasks locally.
在每个时间片t,由于无线用户设备和无线访问接入点之间的无线信道增益对无线电力传输和任务数据传输的效率有很大影响,因此本申请使用
Figure PCTCN2022091260-appb-000001
表示时间片t第i个无线用户设备于无线访问接入点的信道增益,时间片的长度足够小以保证在该时间片内信道增益
Figure PCTCN2022091260-appb-000002
的大小不变。根据瑞利衰落信道模型,无线信道增益可以表示为
Figure PCTCN2022091260-appb-000003
其中∈ t表示是单位均值的独立指数随机变量,
Figure PCTCN2022091260-appb-000004
由下式表示:
In each time slice t, since the wireless channel gain between the wireless user equipment and the wireless access point has a great influence on the efficiency of wireless power transmission and mission data transmission, this application uses
Figure PCTCN2022091260-appb-000001
Indicates the channel gain of the i-th wireless user equipment in the time slice t at the wireless access point, and the length of the time slice is small enough to ensure the channel gain in the time slice
Figure PCTCN2022091260-appb-000002
The size of is unchanged. According to the Rayleigh fading channel model, the wireless channel gain can be expressed as
Figure PCTCN2022091260-appb-000003
where ∈ t denotes an independent exponential random variable with unit mean,
Figure PCTCN2022091260-appb-000004
Expressed by the following formula:
Figure PCTCN2022091260-appb-000005
Figure PCTCN2022091260-appb-000005
其中,A g表示天线增益,f c表示载波频率,l e表示路劲衰落指数,
Figure PCTCN2022091260-appb-000006
表示第i个无线用户设备与无线访问接入点在二维平面的距离,从式中可以看出,随着距离
Figure PCTCN2022091260-appb-000007
的增大,无线信道增益越小。
Among them, A g represents the antenna gain, f c represents the carrier frequency, l e represents the road strength fading index,
Figure PCTCN2022091260-appb-000006
Indicates the distance between the i-th wireless user equipment and the wireless access point on the two-dimensional plane. It can be seen from the formula that with the distance
Figure PCTCN2022091260-appb-000007
The increase of , the smaller the wireless channel gain.
其次建立无线用户设备能量获取模型,在每个时间片t的开始阶段,边缘计算服务器通过无线电力传输技术为每个 用户设备充电q tT秒,其中q t∈[0,1],第i个无线用户设备获得的能量为: Secondly, the energy acquisition model of wireless user equipment is established. At the beginning of each time slice t, the edge computing server charges each user equipment for q t T seconds through wireless power transmission technology, where q t ∈ [0,1], i The energy obtained by a wireless user equipment is:
Figure PCTCN2022091260-appb-000008
Figure PCTCN2022091260-appb-000008
其中,μ∈(0,1)表示无线能量获取的效率,P i表示无线访问连接点与用户设备的传输功率,q t表示无线充电所占时间比例。 Among them, μ ∈ (0, 1) represents the efficiency of wireless energy harvesting, P i represents the transmission power between the wireless access connection point and the user equipment, and q t represents the time ratio of wireless charging.
本申请假设每一个无线用户设备的电池能量是有限的,在时间片t末(即时间片t+1开始),用户设备剩余的电量为:This application assumes that the battery energy of each wireless user equipment is limited, and at the end of time slice t (that is, the beginning of time slice t+1), the remaining power of the user equipment is:
Figure PCTCN2022091260-appb-000009
Figure PCTCN2022091260-appb-000009
其中,E t为在时间片t消耗的能量,H t是在时间片t通过无线电力传输技术获得的能量,
Figure PCTCN2022091260-appb-000010
是无线用户设备所能容纳的最大电量。正常情况下,M t+1应为非负值,如果当前时间片没有充足的能量(M t+1<0),无线用户设备就会丢弃当前的任务,并将M t+1设置为0,在下一时间片重新执行该任务。
Among them, E t is the energy consumed in time slice t, H t is the energy obtained through wireless power transmission technology in time slice t,
Figure PCTCN2022091260-appb-000010
It is the maximum power that the wireless user equipment can hold. Under normal circumstances, M t+1 should be a non-negative value. If the current time slice does not have sufficient energy (M t+1 <0), the wireless user equipment will discard the current task and set M t+1 to 0 , re-execute the task in the next time slice.
然后建立计算任务模型,本申请将在时间片t第i个无线用户设备产生的任务
Figure PCTCN2022091260-appb-000011
可以表示为
Figure PCTCN2022091260-appb-000012
其中
Figure PCTCN2022091260-appb-000013
表示任务
Figure PCTCN2022091260-appb-000014
的数据量(单位:比特),
Figure PCTCN2022091260-appb-000015
表示CPU处理1比特数据需要时间周期数。这样,在执行任务
Figure PCTCN2022091260-appb-000016
时需要的执行周期为
Figure PCTCN2022091260-appb-000017
定义W为无线信道的带宽,信道之间的干扰可以忽略不计,如果k个无线用户设备在时间片t同时卸载当前的任务,无线带宽W将平均分配给各个决定卸载的用户设备。
Then establish a computing task model, this application will generate the task generated by the i-th wireless user equipment in the time slice t
Figure PCTCN2022091260-appb-000011
It can be expressed as
Figure PCTCN2022091260-appb-000012
in
Figure PCTCN2022091260-appb-000013
Indicates the task
Figure PCTCN2022091260-appb-000014
The amount of data (unit: bit),
Figure PCTCN2022091260-appb-000015
Indicates the number of time cycles required for the CPU to process 1-bit data. In this way, when executing the task
Figure PCTCN2022091260-appb-000016
The required execution cycle is
Figure PCTCN2022091260-appb-000017
Define W as the bandwidth of the wireless channel, and the interference between channels can be ignored. If k wireless user equipments unload the current task at the same time in time slice t, the wireless bandwidth W will be evenly allocated to each user equipment that decides to offload.
在获得从无线访问接入点传输的能量后,每个无线用户设备需要决定是将计算任务卸载到边缘服务器端还是在本地执行,从而优化调度降低总体任务的延迟和能耗。本申请采用完全卸载的方式,即当前时间片到达的任务要么选择在无线用户设备本地执行,要么通过计算卸载在MEC服务器上远程执行。使用
Figure PCTCN2022091260-appb-000018
表示在时间片t第i个无线用户设备的卸载决策变量,其中,
Figure PCTCN2022091260-appb-000019
表示无线用户设备选择卸载到边缘计算服务器端(边缘计算),
Figure PCTCN2022091260-appb-000020
表示计算任务在无线用户设备本地执行。以下根据两种方式分别进行描述:
After obtaining the energy transmitted from the wireless access point, each wireless user equipment needs to decide whether to offload the computing task to the edge server or execute it locally, so as to optimize the scheduling to reduce the delay and energy consumption of the overall task. This application adopts a complete offloading method, that is, tasks arriving in the current time slice are either executed locally on the wireless user equipment, or remotely executed on the MEC server through computing offload. use
Figure PCTCN2022091260-appb-000018
Indicates the unloading decision variable of the i-th wireless user equipment in time slice t, where,
Figure PCTCN2022091260-appb-000019
Indicates that the wireless user equipment chooses to offload to the edge computing server (edge computing),
Figure PCTCN2022091260-appb-000020
Indicates that the computing task is performed locally on the wireless user equipment. The following describes the two methods respectively:
1)本地计算模型:1) Local computing model:
本申请移动边缘计算网络中的无线用户设备可以同时无线获取电能以及进行本地计算,
Figure PCTCN2022091260-appb-000021
表示第i个无线用户设备的计算能力(单位:CPU周期/秒),不同设备的计算能力不同,处理任务
Figure PCTCN2022091260-appb-000022
的本地计算时延
Figure PCTCN2022091260-appb-000023
表示为:
The wireless user equipment in the mobile edge computing network of this application can obtain power wirelessly and perform local computing at the same time,
Figure PCTCN2022091260-appb-000021
Indicates the computing capability of the i-th wireless user equipment (unit: CPU cycle/second), the computing capability of different devices is different, and the processing task
Figure PCTCN2022091260-appb-000022
The local computing delay of
Figure PCTCN2022091260-appb-000023
Expressed as:
Figure PCTCN2022091260-appb-000024
Figure PCTCN2022091260-appb-000024
本地计算消耗的能量
Figure PCTCN2022091260-appb-000025
为:
energy consumed by local computing
Figure PCTCN2022091260-appb-000025
for:
Figure PCTCN2022091260-appb-000026
Figure PCTCN2022091260-appb-000026
其中,
Figure PCTCN2022091260-appb-000027
表示第i个无线用户设备在一个CPU周期消耗的能量,具体地,
Figure PCTCN2022091260-appb-000028
可以由下式计算:
in,
Figure PCTCN2022091260-appb-000027
Indicates the energy consumed by the i-th wireless user equipment in one CPU cycle, specifically,
Figure PCTCN2022091260-appb-000028
It can be calculated by the following formula:
Figure PCTCN2022091260-appb-000029
Figure PCTCN2022091260-appb-000029
2)边缘计算模型:2) Edge computing model:
如果第i个无线用户设备选择将任务
Figure PCTCN2022091260-appb-000030
卸载到边缘计算服务器上进行远程执行,那么计算卸载过程可以分为三个部分:首先,无线用户设备通过无线传输方式将任务数据卸载到边缘计算服务器;然后,边缘计算服务器将计算资源分配给卸载的任务完成计算;最后,将任务的计算结果再通过无线传输的方式发送回相应的无线用户设备。由于任务计算结果的量远小于任务数据的量,本申请忽略了计算结果下载产生的传输延迟和能耗。因此,第i个无线用户设备到边缘计算服务器的计算卸载时延可以表示为:
If the i-th wireless user equipment selects the task
Figure PCTCN2022091260-appb-000030
offloading to the edge computing server for remote execution, then the computing offloading process can be divided into three parts: first, the wireless user equipment offloads the task data to the edge computing server through wireless transmission; then, the edge computing server allocates computing resources to the offloading The calculation of the task is completed; finally, the calculation result of the task is sent back to the corresponding wireless user equipment through wireless transmission. Since the amount of task calculation results is much smaller than the amount of task data, this application ignores the transmission delay and energy consumption caused by the download of calculation results. Therefore, the calculation offload delay from the i-th wireless user equipment to the edge computing server can be expressed as:
Figure PCTCN2022091260-appb-000031
Figure PCTCN2022091260-appb-000031
边缘计算服务器运行任务
Figure PCTCN2022091260-appb-000032
的时间为:
Edge computing server running tasks
Figure PCTCN2022091260-appb-000032
The time is:
Figure PCTCN2022091260-appb-000033
Figure PCTCN2022091260-appb-000033
其中,
Figure PCTCN2022091260-appb-000034
表示边缘服务器为任务
Figure PCTCN2022091260-appb-000035
分配的计算资源(单位:CPU周期/秒),用F表示整个边缘服务器的计算资源,需满足条件:
in,
Figure PCTCN2022091260-appb-000034
Indicates the edge server as a task
Figure PCTCN2022091260-appb-000035
Allocated computing resources (unit: CPU cycle/second), using F to represent the computing resources of the entire edge server, must meet the conditions:
Figure PCTCN2022091260-appb-000036
Figure PCTCN2022091260-appb-000036
即表示从边缘服务器分配给所有卸载任务的计算资源总量应小于整个服务器的计算资源F。That is to say, the total amount of computing resources allocated to all offloading tasks from the edge server should be less than the computing resource F of the entire server.
第i个无线用户设备在边缘服务器上本地等待远程执行任务
Figure PCTCN2022091260-appb-000037
期间消耗的能量可以用下式表示:
The i-th wireless user device waits locally on the edge server to perform tasks remotely
Figure PCTCN2022091260-appb-000037
The energy consumed during the period can be expressed by the following formula:
Figure PCTCN2022091260-appb-000038
Figure PCTCN2022091260-appb-000038
其中,
Figure PCTCN2022091260-appb-000039
表示处于空闲状态下第i个无线用户设备的功耗。
in,
Figure PCTCN2022091260-appb-000039
Indicates the power consumption of the i-th wireless user equipment in the idle state.
基于上述建立的网络系统模型,本申请通过对任务卸载和资源分配的联合优化,提出使系统总成本最小化的优化 成本函数,具体优化目标问题描述如下:Based on the network system model established above, this application proposes an optimization cost function that minimizes the total system cost through the joint optimization of task offloading and resource allocation. The specific optimization objective problem is described as follows:
Figure PCTCN2022091260-appb-000040
Figure PCTCN2022091260-appb-000040
Figure PCTCN2022091260-appb-000041
Figure PCTCN2022091260-appb-000041
Figure PCTCN2022091260-appb-000042
Figure PCTCN2022091260-appb-000042
Figure PCTCN2022091260-appb-000043
Figure PCTCN2022091260-appb-000043
0<q t<1,     (d) 0<qt<1, ( d )
Figure PCTCN2022091260-appb-000044
Figure PCTCN2022091260-appb-000044
Figure PCTCN2022091260-appb-000045
Figure PCTCN2022091260-appb-000045
上式中整个系统的优化成本函数分为本地计算成本和计算卸载到边缘服务器的成本两部分,分别用
Figure PCTCN2022091260-appb-000046
Figure PCTCN2022091260-appb-000047
表示,具体地表示为:
The optimization cost function of the entire system in the above formula is divided into two parts: the local computing cost and the cost of offloading the computing to the edge server.
Figure PCTCN2022091260-appb-000046
and
Figure PCTCN2022091260-appb-000047
Expressed specifically as:
Figure PCTCN2022091260-appb-000048
Figure PCTCN2022091260-appb-000048
Figure PCTCN2022091260-appb-000049
Figure PCTCN2022091260-appb-000049
其中,ω1和ω3为任务处理延迟的权重,ω2和ω4为能耗的权重,并满足{0≤ω i≤1|ω i∈{ω 1,ω 2,ω 3,ω 4}}且ω 12=1,ω 34=1。 Among them, ω1 and ω3 are the weights of task processing delay, ω2 and ω4 are the weights of energy consumption, and satisfy {0≤ω i ≤1|ω i ∈{ω 1 , ω 2 , ω 3 , ω 4 }} and ω 12 =1, ω 34 =1.
问题P中
Figure PCTCN2022091260-appb-000050
表示所有无线用户设备的卸载决策变量,
Figure PCTCN2022091260-appb-000051
是指无线用户设备卸载数据所消耗的能量占总能量的百分比,
Figure PCTCN2022091260-appb-000052
是资源分配向量,每个分量表示边缘服务器分配给每个上传任务的计算资源。本申请规定如果无线用户设备i选择本地执行任务
Figure PCTCN2022091260-appb-000053
那么边缘服务器将不会为其分配计算资源,即当
Figure PCTCN2022091260-appb-000054
时,
Figure PCTCN2022091260-appb-000055
约束条件(a)表示无线用户设备要么选择将任务卸载到服务器执行要么再本地执行。约束条件(b)表示边缘服务器分配给执行卸载任务的任意无线用户设备的计算资源不能超过资源最大值。约束条件(c)保证分配的计算资源之和不超过边缘服务器的资源最大值。(f)规定在时间片t,每个无线用户设备的当前的电量既不能大于设备所能提供的最大能量,也不能为负值,否则需要加上惩罚项。
In question P
Figure PCTCN2022091260-appb-000050
Denotes the offload decision variable for all wireless UEs,
Figure PCTCN2022091260-appb-000051
Refers to the percentage of energy consumed by wireless user equipment to offload data to the total energy,
Figure PCTCN2022091260-appb-000052
is a resource allocation vector, and each component represents the computing resource allocated by the edge server to each upload task. This application stipulates that if the wireless user equipment i chooses to perform tasks locally
Figure PCTCN2022091260-appb-000053
Then the edge server will not allocate computing resources for it, that is, when
Figure PCTCN2022091260-appb-000054
hour,
Figure PCTCN2022091260-appb-000055
Constraint (a) indicates that the wireless user equipment either chooses to offload the task to the server or execute it locally. Constraint (b) indicates that the computing resource allocated by the edge server to any wireless user equipment performing the offloading task cannot exceed the maximum resource value. Constraint (c) ensures that the sum of allocated computing resources does not exceed the maximum resource value of the edge server. (f) It is stipulated that in the time slice t, the current power of each wireless user equipment can neither be greater than the maximum energy that the equipment can provide, nor can it be a negative value, otherwise a penalty item needs to be added.
可选地,根据优化成本函数对系统模型进行训练,得到任务卸载模型的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,任务卸载模型包括第一任务卸载模型和第二任务卸载模型,根据优化成本函数对系统模型进行训练,得到任务卸载模型的步骤可以包括以下子步骤:Optionally, the system model is trained according to the optimization cost function to obtain the task offloading model. The specific method is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading model includes a first task offloading model and a second task offloading model, the system model is trained according to an optimized cost function, and the step of obtaining the task offloading model may include the following substeps:
对优化成本函数进行分割处理,得到第一优化成本函数和第二优化成本函数;根据第一优化成本函数对系统模型进行训练,得到第一任务卸载模型;根据第二优化成本函数对系统模型进行训练,得到第二任务卸载模型。Segment the optimization cost function to obtain the first optimization cost function and the second optimization cost function; train the system model according to the first optimization cost function to obtain the first task offloading model; Training to obtain the second task offloading model.
详细地,可以将原优化问题分解为1)无线用户设备的任务计算卸载与能量传输和2)边缘计算服务器计算资源与能量分配两个子问题,结合图4,可以分别设计基于深度强化学习方法和交替方向乘子法的系统优化框架。In detail, the original optimization problem can be decomposed into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation. Combining with Figure 4, the deep reinforcement learning method and System optimization framework for alternating direction multiplier methods.
显然,优化函数的求解P属于混合整数非线性规划(Mixed-Integer NonLinear Programming,MINLP)问题,即它是一个非凸问题。当用户数N增加时,该问题的计算复杂度急剧增加,难以直接求解。因此,考虑到四个待求变量(x t,f t,q t,h t)的依赖性(例如,如果x t的某个分量
Figure PCTCN2022091260-appb-000056
为0,那么对应f t和h t的分量的值也为0。本申请将问题分解为以下两个子问题,每个子问题的待求变量之间不存在依赖性:1)无线用户设备的任务计算卸载与能量传输(P1),即如何确定x t、q t的值;2)边缘计算服务器计算资源与能量分配(P2)。一旦确定了x t、q t的值,求解f t、h t就会变得容易。
Obviously, the solution P of the optimization function belongs to the mixed-integer non-linear programming (Mixed-Integer NonLinear Programming, MINLP) problem, that is, it is a non-convex problem. When the number of users N increases, the computational complexity of this problem increases sharply, and it is difficult to solve it directly. Therefore, considering the dependence of the four variables to be sought (x t , f t , q t , h t ) (for example, if a certain component of x t
Figure PCTCN2022091260-appb-000056
is 0, then the values of the components corresponding to f t and h t are also 0. This application decomposes the problem into the following two sub-problems, and there is no dependence between the variables to be determined in each sub-problem: 1) Task calculation offloading and energy transmission (P1) of wireless user equipment, that is, how to determine x t , q t 2) edge computing server computing resources and energy allocation (P2). Once the values of x t and q t are determined, it becomes easy to solve f t and h t .
可选地,根据第一优化成本函数对系统模型进行训练,得到第一任务卸载模型的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,可以包括以下子步骤:Optionally, the specific manner of training the system model according to the first optimization cost function to obtain the first task offloading model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:
基于系统模型建立深度强化学习模型;根据第一优化成本函数对深度强化学习模型进行训练,得到第一任务卸载模型。A deep reinforcement learning model is established based on the system model; the deep reinforcement learning model is trained according to the first optimization cost function to obtain a first task offloading model.
详细地,对于子问题P1,每个无线用户设备产生的任务的计算卸载决策优化问题仍然是非凸的问题。传统的数值优化方法往往需要大量的迭代计算才能得到满意的结果,这使得其不适合在信道增益变化的动态环境中进行实时MEC。因此,本申请采用强化学习来实现计算卸载的实时调度。In detail, for subproblem P1, the computational offloading decision optimization problem for tasks generated by each wireless user equipment is still a non-convex problem. Traditional numerical optimization methods often require a large number of iterative calculations to obtain satisfactory results, which makes them unsuitable for real-time MEC in dynamic environments where channel gain changes. Therefore, this application adopts reinforcement learning to realize real-time scheduling of computing offloading.
在信道条件和无线用户设备位置动态变化的计算卸载环境中,根据子问题P1,由于高维的状态空间和动作空间,移动边缘计算网络的系统状态转换概率通常无法获得,本申请基于深度强化学习的方法让每个无线用户设备根据当前系统状态选择是否将时间片t到达的任务卸载到边缘服务器。In a computing offloading environment where channel conditions and wireless user equipment locations change dynamically, according to subproblem P1, the system state transition probabilities of mobile edge computing networks are usually unobtainable due to the high-dimensional state space and action space, and this application is based on deep reinforcement learning The method of allows each wireless user equipment to choose whether to offload the task of time slice t arrival to the edge server according to the current system state.
具体的P1问题可以表示为:The specific P1 problem can be expressed as:
Figure PCTCN2022091260-appb-000057
Figure PCTCN2022091260-appb-000057
首先,基于强化学习的方法需要定义求解问题的状态、动作和奖励函数,具体如下:First, the method based on reinforcement learning needs to define the state, action and reward function of solving the problem, as follows:
状态:在每个时间片t,移动边缘计算网络的状态包括:每个无线用户设备与无线访问接入点距离d t以及信道增益g t,当前处理的每个计算任务的数据量大小b t,在时间片t的开始阶段可用的能量M t,即s t=[d t,g t,b t,M t]。 State: In each time slice t, the state of the mobile edge computing network includes: the distance between each wireless user equipment and the wireless access point d t and the channel gain g t , the data volume of each computing task currently processed b t , the available energy M t at the beginning of time slice t, ie s t =[d t , g t , b t , M t ].
动作:根据问题P1的定义,需要确定每个无线用户设备的计算卸载向量x t与能量传输变量q t,即a t=[x t,q t]。基于观测到的状态s t,基于强化学习的方法通过学习系统的状态转移策略π,获得状态s t到动作a t的近似最优映射。 Action: According to the definition of problem P1, it is necessary to determine the calculation offload vector x t and the energy transfer variable q t of each wireless user equipment, ie at = [x t , q t ] . Based on the observed state s t , the method based on reinforcement learning obtains an approximate optimal mapping from the state s t to the action a t by learning the state transition strategy π of the system.
奖励函数:当动作a t=[x t,q t]的值确定之后,f t、h t的值可以根据ECRA算法求解,优化问题的目标是使系统成本和由于电量不足丢弃任务引入的惩罚项之和最小化,而强化学习的目标是获得最大的奖励,因此,我们可以将强化学习算法的即时奖励函数定义为: Reward function: After the value of the action a t = [x t , q t ] is determined, the values of f t and h t can be solved according to the ECRA algorithm. The sum of terms is minimized, and the goal of reinforcement learning is to obtain the maximum reward. Therefore, we can define the immediate reward function of the reinforcement learning algorithm as:
Figure PCTCN2022091260-appb-000058
Figure PCTCN2022091260-appb-000058
其中,
Figure PCTCN2022091260-appb-000059
表示当无线用户设备的能量不足以执行当前时间片到达的任务时(也就是M t+1<0),此时应该 丢弃该任务,因此需要引入惩罚项尽可能防止发生这样的情况。本申请采用指示函数1{cond}表示满足cond条件时引入任务失败的惩罚,因此惩罚代价成本函数表示为:
in,
Figure PCTCN2022091260-appb-000059
It means that when the energy of the wireless user equipment is not enough to execute the task arriving in the current time slice (that is, M t+1 <0), the task should be discarded at this time, so it is necessary to introduce a penalty item to prevent such a situation from happening as much as possible. In this application, the indicator function 1{cond} is used to indicate the penalty for introducing task failure when the cond condition is met, so the penalty cost cost function is expressed as:
Figure PCTCN2022091260-appb-000060
Figure PCTCN2022091260-appb-000060
其中,λ 1和λ 2为惩罚的权重,|·|表示绝对值。 Among them, λ 1 and λ 2 are the weight of penalty, and |·| represents the absolute value.
完成上述问题定义之后,本申请基于双延迟深度确定性策略梯度算法(twin delayed deep deterministic policy gradient,TD3)对复杂高维动作空间的探索策略进行改进,提出了基于强化学习的计算卸载和能量传输方法(RL-Based approach for Computation Offloading and Energy Transmission,RLCOET),从而避免了因难以充分探索动作空间而导致收敛缓慢或陷入局部最优解的问题。After completing the above problem definition, this application improves the exploration strategy of complex high-dimensional action spaces based on the twin delayed deep deterministic policy gradient algorithm (twin delayed deep deterministic policy gradient, TD3), and proposes computational offloading and energy transfer based on reinforcement learning Method (RL-Based approach for Computation Offloading and Energy Transmission, RLCOET), thus avoiding the problem of slow convergence or falling into a local optimal solution due to the difficulty of fully exploring the action space.
TD3算法包含两个评论家网络和一个动作网络,两个评论家网络分别估计两个Q值(价值预测值),即
Figure PCTCN2022091260-appb-000061
Figure PCTCN2022091260-appb-000062
动作网络以当前状态为输入,输出相应的动作。为了加快动作空间维度较高时模型的学习进程,我们改进了原算法的探索或利用策略,经过该策略生成的动作a t结合ECRA优化方法计算出当前时间片剩余的优化变量并进一步得到当前奖励R t和下一阶段状态s t+1,将(s t,a t,R t,s t+1)作为一次与环境交互得到的经验存储到经验池中,并在神经网络训练阶段选取一批损失值较大的经验,通过优先经验重放技术来训练神经网络。以下是RLCOET算法用到的相关技术:
The TD3 algorithm includes two critic networks and one action network, and the two critic networks respectively estimate two Q values (value prediction values), namely
Figure PCTCN2022091260-appb-000061
and
Figure PCTCN2022091260-appb-000062
The action network takes the current state as input and outputs the corresponding action. In order to speed up the learning process of the model when the dimension of the action space is high, we improved the exploration or utilization strategy of the original algorithm. The action a t generated by this strategy is combined with the ECRA optimization method to calculate the remaining optimization variables of the current time slice and further obtain the current reward R t and the state st+1 of the next stage, store (st t , a t , R t , st+1 ) as an experience obtained from an interaction with the environment in the experience pool, and select a The experience with a large batch loss value is used to train the neural network through the priority experience replay technology. The following are related technologies used in the RLCOET algorithm:
1)动作候选解集的生成与选择:1) Generation and selection of action candidate solution sets:
由于RLCOET算法的动作网络输出的动作a t=[x t,q t]属于高维空间,共N+1维。直接引入高斯噪声的方式对动作空间进行探索只使用只有少量动作变量的情况,在高维空间中则很难通过有效的探索来使神经学习到最优策略,所以我们改进了动作空间中的探索策略。参见图5,动作网络有两个分支:一部分用于预测能量传输比例q t,它是0和1之间的一维连续变量,所以这一项在进行动作探索时引入高斯噪声并对结果进行剪裁,使其也保持在0和1之间;另一部分x t为N维离散向量,求解的搜索空间为2 N。动作网络的输出为连续的松弛决策变量
Figure PCTCN2022091260-appb-000063
使用保序量化法生成K个离散的决策动作
Figure PCTCN2022091260-appb-000064
保序量化法具有平衡模型计算复杂度和模型性能的优点,可以在K较小的情况下实现对x t动作空间的广泛搜索。对于生成的每一个卸载决策向量
Figure PCTCN2022091260-appb-000065
联合ECRA算法求出的f t与h t即时奖励函数计算得到当前的K个候选的奖励值
Figure PCTCN2022091260-appb-000066
其中选择最高的
Figure PCTCN2022091260-appb-000067
值对应的动作变量作为当前最优卸载决策行为,记为
Figure PCTCN2022091260-appb-000068
即:
Since the action a t =[x t , q t ] output by the action network of the RLCOET algorithm belongs to a high-dimensional space, it has a total of N+1 dimensions. The method of directly introducing Gaussian noise to explore the action space only uses a small number of action variables. In the high-dimensional space, it is difficult to make the neural learn the optimal strategy through effective exploration, so we improved the exploration in the action space. Strategy. See Figure 5, the action network has two branches: one is used to predict the energy transfer ratio q t , which is a one-dimensional continuous variable between 0 and 1, so this item introduces Gaussian noise during action exploration and evaluates the result Clipping, so that it also remains between 0 and 1; the other part x t is an N-dimensional discrete vector, and the search space for the solution is 2 N . The output of the action network is a continuous slack decision variable
Figure PCTCN2022091260-appb-000063
Generate K discrete decision-making actions using the order-preserving quantization method
Figure PCTCN2022091260-appb-000064
The order-preserving quantization method has the advantage of balancing the computational complexity and model performance of the model, and can realize an extensive search of the x t action space when K is small. For each unloading decision vector generated
Figure PCTCN2022091260-appb-000065
Combine the f t and h t instant reward functions calculated by the ECRA algorithm to calculate the reward values of the current K candidates
Figure PCTCN2022091260-appb-000066
which selects the highest
Figure PCTCN2022091260-appb-000067
The action variable corresponding to the value is used as the current optimal unloading decision-making behavior, denoted as
Figure PCTCN2022091260-appb-000068
which is:
Figure PCTCN2022091260-appb-000069
Figure PCTCN2022091260-appb-000069
2)优先经验回放2) Priority experience playback
RLCOET算法每次与系统环境交互获得的经验(s t,a t,R t,s t+1)存放到经验池中,其中,a t与R t是在动作生成与选择的最佳行动和奖励。在模型训练过程中,我们从经验池中抽取一批经验样本来更新动作网络和评论家网络。与常见的强化学习中随机采样训练神经网络不同,本申请采用优先经验回放技术,以SumTree结构设置经验池,将样本按照优先级进行排序,如果样本的损失值较高,则优先级较高,更有可能被选取来更新网络参数,这种方式可以更有效地训练网络,加速模型的收敛。为了防止部分样本频繁地被选取训练造成的过拟合,以及网络在早期训练过程中容易出现离群值的问题,在样本的选取中加入了随机性,这样优先级较低的样本也有可能被选到,样本i被选取的概率是: The experience (st t , at t , R t , st t+1 ) obtained by the RLCOET algorithm each time it interacts with the system environment is stored in the experience pool, where at and R t are the best action sums in action generation and selection award. During model training, we draw a batch of experience samples from the experience pool to update the action network and critic network. Different from the random sampling training neural network in the common reinforcement learning, this application adopts the priority experience playback technology, sets up the experience pool with the SumTree structure, and sorts the samples according to the priority. If the loss value of the sample is higher, the priority is higher. It is more likely to be selected to update the network parameters, which can train the network more effectively and accelerate the convergence of the model. In order to prevent overfitting caused by frequent selection of some samples for training, and the problem that the network is prone to outliers in the early training process, randomness is added to the selection of samples, so that samples with lower priority may also be selected. Selected, the probability that sample i is selected is:
Figure PCTCN2022091260-appb-000070
Figure PCTCN2022091260-appb-000070
其中,p i是样本i的优先级,υ是优先级被使用的数量。 where p i is the priority of sample i, and υ is the number of priorities used.
3)策略更新:3) Strategy update:
设演员网络和对应的目标演员网络的参数分别表示为η和η′,评论家网络和对应的评论家目标网络的参数分别表示为δ i和δ′ i,i={1,2},由于两个评论家网络的输出Q值不同,选择两个Q值中小的一个作为网络的更新目标,即: Let the parameters of the actor network and the corresponding target actor network be denoted as η and η′ respectively, and the parameters of the critic network and the corresponding critic target network be denoted as δ i and δ′ i , i={1, 2}, since The output Q values of the two critic networks are different, and the smaller of the two Q values is selected as the update target of the network, namely:
Figure PCTCN2022091260-appb-000071
Figure PCTCN2022091260-appb-000071
其中,一个与
Figure PCTCN2022091260-appb-000072
相关的评论家网络用于更新,y t
Figure PCTCN2022091260-appb-000073
Figure PCTCN2022091260-appb-000074
的更新目标。
Among them, one with
Figure PCTCN2022091260-appb-000072
The associated critic network is used for updating, and y t is
Figure PCTCN2022091260-appb-000073
and
Figure PCTCN2022091260-appb-000074
The update target for .
由于网络参数的初始值不同,在网络训练之初选择两个评论家网络预测的较小的数值来估计Q值,以防止过高估计Q值造成的偏差,因为在每一次更新时都会出现一个小的误差,当网络更新很多次时,误差就会累积起来,导致性能不佳。除了使用延迟策略更新来避免偏差的过度积累外,本申请还对目标动作空间周围邻域进行数值平滑化以减少误差,即在目标动作网络中加入一定量的噪声ζ。Since the initial values of the network parameters are different, at the beginning of the network training, the smaller value predicted by the two critic networks is selected to estimate the Q value to prevent the bias caused by overestimating the Q value, because a Small errors, when the network is updated many times, the errors can accumulate and lead to poor performance. In addition to using delayed policy updates to avoid excessive accumulation of bias, this application also performs numerical smoothing on the neighborhood around the target action space to reduce errors, that is, adding a certain amount of noise ζ in the target action network.
Figure PCTCN2022091260-appb-000075
Figure PCTCN2022091260-appb-000075
其中,噪声ζ可以看作是一种正则化,它使值函数的更新更加平稳,让目标Q值Q target的预测值更精确、更鲁棒。 Among them, the noise ζ can be regarded as a kind of regularization, which makes the update of the value function more stable, and makes the predicted value of the target Q value Q target more accurate and robust.
评论家网络的估计值
Figure PCTCN2022091260-appb-000076
对目标网络y t进行近似,它们的损失函数L计算如下
Estimates of the critic network
Figure PCTCN2022091260-appb-000076
Approximating the target network y t , their loss function L is calculated as follows
Figure PCTCN2022091260-appb-000077
Figure PCTCN2022091260-appb-000077
由于动作a t包含离散向量(x t)和连续变量(q t),所以网络损失函数也包含两部分。对于变量q t,推导出损失函数的梯度,以更新动作网络的参数,如下所示: Since the action a t contains a discrete vector (x t ) and a continuous variable (q t ), the network loss function also contains two parts. For the variable qt , the gradient of the loss function is derived to update the parameters of the action network as follows:
Figure PCTCN2022091260-appb-000078
Figure PCTCN2022091260-appb-000078
其中,N m是从优先经验回放经验池中选择的样本数,对于卸载向量x t,使用平均交叉熵损失来更新动作网络的参数η: where N m is the number of samples selected from the prior experience replay experience pool, and for the offload vector x t , the average cross-entropy loss is used to update the parameter η of the action network:
Figure PCTCN2022091260-appb-000079
Figure PCTCN2022091260-appb-000079
其中,x t为a t的卸载向量部分。综上,更新动作网络的总损失函数为: Among them, x t is the unloading vector part of a t . In summary, the total loss function for updating the action network is:
Figure PCTCN2022091260-appb-000080
Figure PCTCN2022091260-appb-000080
其中,λ g为变量q t损失项的权重。 where λg is the weight of the variable qt loss term.
可选地,根据第二优化成本函数对系统模型进行训练,得到第二任务卸载模型的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,可以包括以下子步骤:Optionally, the specific manner of training the system model according to the second optimization cost function to obtain the second task offloading model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:
基于系统模型建立交替方向乘子法模型;根据第二优化成本函数对交替方向乘子法模型进行训练,得到第二任务卸载模型。An alternating direction multiplier method model is established based on the system model; the alternate direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
详细地,对于子问题P2,由于P2的待求变量存在大量的约束条件,强化学习很难在有限的时间内获得理想的策略,在求解问题P1后,原问题P变成了一个凸优化问题,这时就可以使用传统的凸优化算法来解决。本申请基于交替方向乘子法(Alternating Direction Method of Multipliers,ADMM)提出能量与计算资源分配算法(Energy and Computation-Resource Allocation,ECRA)求解P2,其时间复杂度仅为O(N)。In detail, for the sub-problem P2, since there are a large number of constraints on the variables to be sought in P2, it is difficult for reinforcement learning to obtain an ideal strategy within a limited time. After solving the problem P1, the original problem P becomes a convex optimization problem , then the traditional convex optimization algorithm can be used to solve it. This application proposes an Energy and Computation-Resource Allocation (ECRA) algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve P2, and its time complexity is only O(N).
也就是说,可以使用交替方向乘子法获得每个上传至边缘服务器的任务分配的计算资源大小与能量分配比例。根据RLCOET强化学习算法,可以得到问题P1的优化变量x t,q t。本步骤中采用基于ADMM的方法来解决问题P2。ADMM方法是一种求解优化问题的计算框架,适用于解决大规模分布式凸优化问题。ADMM通过“分解-协调”的处理将一个大的全局问题分解成多个较小且容易解决的子问题,并协调各子问题的解,从而得到总的全局问题的解。这种方法弥补了最优解附近的惩罚项的系数趋于无穷大的缺点。为了将原优化目标问题P转换为ADMM方法容易求解的形式,引入两个额外的变量ψ t
Figure PCTCN2022091260-appb-000081
并由此提出ECRA算法,转换后的问题P2可表示为:
That is to say, the computing resource size and energy allocation ratio of each task uploaded to the edge server can be obtained by using the alternating direction multiplier method. According to the RLCOET reinforcement learning algorithm, the optimization variables x t and q t of the problem P1 can be obtained. In this step, ADMM-based method is adopted to solve problem P2. The ADMM method is a computational framework for solving optimization problems, which is suitable for solving large-scale distributed convex optimization problems. ADMM decomposes a large global problem into multiple smaller and easy-to-solve sub-problems through "decomposition-coordination" processing, and coordinates the solutions of each sub-problem to obtain the solution of the overall global problem. This method makes up for the shortcoming that the coefficient of the penalty term near the optimal solution tends to infinity. In order to transform the original optimization objective problem P into a form that is easy to solve by the ADMM method, two additional variables ψ t and
Figure PCTCN2022091260-appb-000081
And thus the ECRA algorithm is proposed, and the converted problem P2 can be expressed as:
Figure PCTCN2022091260-appb-000082
Figure PCTCN2022091260-appb-000082
Figure PCTCN2022091260-appb-000083
Figure PCTCN2022091260-appb-000083
Figure PCTCN2022091260-appb-000084
Figure PCTCN2022091260-appb-000084
Figure PCTCN2022091260-appb-000085
Figure PCTCN2022091260-appb-000085
Figure PCTCN2022091260-appb-000086
Figure PCTCN2022091260-appb-000086
Figure PCTCN2022091260-appb-000087
Figure PCTCN2022091260-appb-000087
Figure PCTCN2022091260-appb-000088
Figure PCTCN2022091260-appb-000088
Figure PCTCN2022091260-appb-000089
Figure PCTCN2022091260-appb-000089
Figure PCTCN2022091260-appb-000090
时,
Figure PCTCN2022091260-appb-000091
Figure PCTCN2022091260-appb-000092
的值与参与本地执行的设备无关。P2被转化为包含两类变量的带约束优化问题。这种结构可以很容易地处理优化目标中的正则化项。P2采用ADMM算法与增广拉格朗日方法求解,如下所示:
when
Figure PCTCN2022091260-appb-000090
hour,
Figure PCTCN2022091260-appb-000091
and
Figure PCTCN2022091260-appb-000092
The value of is independent of the devices participating in the local execution. P2 is transformed into a constrained optimization problem involving two types of variables. This structure can easily handle the regularization term in the optimization objective. P2 is solved using the ADMM algorithm and the augmented Lagrangian method, as follows:
Figure PCTCN2022091260-appb-000093
Figure PCTCN2022091260-appb-000093
其中,α={f t,h t},β={ψ t,z t},∈={θ t,τ t}。惩罚项系数ρ(ρ>0)为一个固定值。通过逐渐迭代更新α,β,∈的值来解决上式优化问题。假设第j轮的变量为α j,β j,∈ j,那么在第j+1轮中更新各变量的步骤如下: Wherein, α={ft ,h t }, β={ψ t ,z t }, ∈={θ t , τ t } . The penalty item coefficient ρ (ρ>0) is a fixed value. The above optimization problem is solved by gradually iteratively updating the values of α, β, ∈. Assuming that the variables in the jth round are α j , β j , ∈ j , then the steps to update each variable in the j+1th round are as follows:
1)给定第j轮的变量{β j,∈ j},通过最小化上式的值更新α j+1,即: 1) Given the variable {β j , ∈ j } of the jth round, update α j+1 by minimizing the value of the above formula, namely:
Figure PCTCN2022091260-appb-000094
Figure PCTCN2022091260-appb-000094
其中,在L ρ(α,β j,∈ j)中存在对N的求和运算,因此可以将其分解为N个并行计算的子问题。每一个子问题可以表示为: Among them, there is a summation operation over N in L ρ (α, β j , ∈ j ), so it can be decomposed into N sub-problems for parallel computing. Each sub-problem can be expressed as:
Figure PCTCN2022091260-appb-000095
Figure PCTCN2022091260-appb-000095
Figure PCTCN2022091260-appb-000096
Figure PCTCN2022091260-appb-000096
通过这样的方式,可以将上式可以转化为满足限制条件的凸优化问题,其解可以通过传统的优化算法得到。因此,根据N个子问题的解,我们可以得到α j+1的值。上式的计算复杂度为O(1),N问题的总复杂度为O(N)。 In this way, the above formula can be transformed into a convex optimization problem that satisfies the constraints, and its solution can be obtained through traditional optimization algorithms. Therefore, according to the solution of N subproblems, we can get the value of αj+1 . The computational complexity of the above formula is O(1), and the total complexity of the N problem is O(N).
2)上一步得到α j+1的值后,就可以在给定α j+1和∈ j的情况下,更新β的值,使L(α,β,∈)最小化,可以将此 步骤的优化问题表示为: 2) After obtaining the value of α j+1 in the previous step, the value of β can be updated to minimize L(α, β, ∈) given α j+1 and ∈ j , and this step can be The optimization problem of is expressed as:
Figure PCTCN2022091260-appb-000097
Figure PCTCN2022091260-appb-000097
此问题的计算复杂度为O(N)。The computational complexity of this problem is O(N).
3)当求出α j+1和β j+1的值后,通过最小化L(α,β,∈)更新∈ j+1的值,如下式所示: 3) After calculating the values of α j+1 and β j+1 , update the value of ∈ j+1 by minimizing L(α, β, ∈), as shown in the following formula:
Figure PCTCN2022091260-appb-000098
Figure PCTCN2022091260-appb-000098
具体地,本问题的计算复杂度为O(N)。Specifically, the computational complexity of this problem is O(N).
不断迭代执行上述三个步骤,直到满足以下两个条件:绝对误差
Figure PCTCN2022091260-appb-000099
和相对误差
Figure PCTCN2022091260-appb-000100
均小于给定的阈值。基于ADMM的方法,问题P2可由图6所示的ECRA算法求解并且可以保证算法的收敛性,其收敛性与ρ有关,根据以上对各步计算复杂度的分析,总算法的复杂度为O(N)。值得注意的是,由于原问题是非凸问题,虽然不能保证用该算法能找到原问题的最优解,但得出的近似解与最优解的误差在可控范围内。
The above three steps are performed iteratively until the following two conditions are met: absolute error
Figure PCTCN2022091260-appb-000099
and relative error
Figure PCTCN2022091260-appb-000100
are less than a given threshold. Based on the method of ADMM, the problem P2 can be solved by the ECRA algorithm shown in Figure 6 and the convergence of the algorithm can be guaranteed, and its convergence is related to ρ. According to the above analysis of the computational complexity of each step, the complexity of the total algorithm is O( N). It is worth noting that since the original problem is non-convex, although there is no guarantee that the algorithm can find the optimal solution to the original problem, the error between the approximate solution and the optimal solution obtained is within a controllable range.
最后,可以根据深度强化学习模型与交替方向乘子法模型的计算结果提出有效的优化算法训练模型直到满足要求,得到任务卸载模型。Finally, according to the calculation results of the deep reinforcement learning model and the alternating direction multiplier method model, an effective optimization algorithm can be proposed to train the model until it meets the requirements, and the task offloading model can be obtained.
需要说明的是,整个基于强化学习的调度优化方法可以由图7表示,与训练深度强化学习模型和交替方向乘子法模型的步骤对应。首先,初始化评论家网络的参数和动作网络参数,初始化强化学习评论家目标网络和动作目标网络的参数以及经验池的经验数据,初始化大规模无人机辅助移动边缘计算网络模型的参数与神经网络训练轮数t=1。其次,判断当前随机概率是否小于预设值,若是,则直接输出当前动作,若否,则量化K组候选解集,对动作加入高斯噪声,选择最优的动作。然后,根据ECRA算法计算资源与能量分配优化变量,获得下一阶段状态与立即奖励,并将经验存入经验池,根据优先经验回放策略从经验池中抽取一批经验,更新神经网络参数,t=t+1,判断t是否小于T,若是,则重新判断随机概率,若否,则结束。It should be noted that the entire scheduling optimization method based on reinforcement learning can be represented by Figure 7, which corresponds to the steps of training the deep reinforcement learning model and the alternating direction multiplier method model. First, initialize the parameters of the critic network and action network parameters, initialize the parameters of the reinforcement learning critic target network and action target network and the experience data of the experience pool, and initialize the parameters of the large-scale UAV-assisted mobile edge computing network model and the neural network The number of training rounds t=1. Secondly, judge whether the current random probability is less than the preset value, if so, directly output the current action, if not, quantize K sets of candidate solutions, add Gaussian noise to the action, and select the optimal action. Then, calculate resource and energy allocation optimization variables according to the ECRA algorithm, obtain the next stage state and immediate reward, and store the experience in the experience pool, draw a batch of experience from the experience pool according to the priority experience playback strategy, and update the neural network parameters, t =t+1, judge whether t is less than T, if so, re-judge the random probability, if not, end.
对于步骤S320,需要说明的是,得到任务卸载策略的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,任务卸载策略包括第一任务卸载策略和第二任务卸载策略,将待处理任务输入预设的任务卸载模型,得到任务卸载策略的步骤可以包括以下子步骤:Regarding step S320, it should be noted that the specific manner of obtaining the task offloading policy is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, the task to be processed is input into a preset task offloading model, and the step of obtaining the task offloading strategy may include the following sub-steps :
将待处理任务输入第一任务卸载模型,得到第一任务卸载策略;将待处理任务输入第二任务卸载模型,得到第二任务卸载策略。Enter the pending tasks into the first task offloading model to obtain the first task offloading strategy; input the pending tasks into the second task offloading model to obtain the second task offloading strategy.
其中,第一任务卸载策略可以包括每个无线用户设备的计算卸载决策变量与设备无线充电所占时间比例,第二任务卸载策略可以包括每个上传至边缘服务器的任务分配的计算资源大小与能量分配比例。Wherein, the first task offloading strategy may include the computing offloading decision variable of each wireless user equipment and the proportion of time spent on wireless charging of the device, and the second task offloading strategy may include the computing resource size and energy allocated for each task uploaded to the edge server distribution ratio.
也就是说,本申请实施例提供了一种大规模移动边缘计算网络中的高效在线卸载方法,包括以下子步骤:That is to say, the embodiment of the present application provides an efficient online offloading method in a large-scale mobile edge computing network, including the following sub-steps:
步骤1、构建针对大规模移动计算网络的系统模型并给出基于无线充电设备卸载任务执行时延与消耗能量的优化目标函数。 Step 1. Construct a system model for a large-scale mobile computing network and provide an optimization objective function based on the wireless charging device offloading task execution delay and energy consumption.
步骤2、将原优化问题分解为1)无线用户设备的任务计算卸载与能量传输和2)边缘计算服务器计算资源与能量分配两个子问题,分别设计基于深度强化学习方法和交替方向乘子法的系统优化框架。 Step 2. Decompose the original optimization problem into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation, and respectively design the sub-problems based on deep reinforcement learning method and alternating direction multiplier method System optimization framework.
步骤3、针对步骤2中子问题1,提出基于深度强化学习的方法求出每个无线用户设备的计算卸载决策变量与设备无线充电所占时间比例。 Step 3. Aiming at sub-problem 1 in step 2, a method based on deep reinforcement learning is proposed to obtain the ratio of computing offloading decision variables of each wireless user device to the time spent on wireless charging of the device.
步骤4、针对步骤2中子问题2,使用交替方向乘子法获得每个上传至边缘服务器的任务分配的计算资源大小与能量分配比例。 Step 4. For sub-problem 2 in step 2, use the alternating direction multiplier method to obtain the size of computing resources allocated to each task uploaded to the edge server and the energy allocation ratio.
步骤5、根据步骤3与步骤4的计算结果提出有效的优化算法训练模型直到满足要求。 Step 5. According to the calculation results of Step 3 and Step 4, an effective optimization algorithm is proposed to train the model until the requirements are met.
本申请使用一种全新的针对移动边缘计算网络的计算卸载方法,提出的RLCOET算法可以通过学习与动态边缘计算网络环境下因无线用户设备移动产生的交互经验,获得高效的卸载策略。与传统的优化方法相比,本申请的方法减轻了通过反复迭代计算解决调度优化的要求,并使所有任务获得满意的计算延迟和较低的能耗。相对于现有的大多数基于学习的方法,将所有的调度变量一起优化,在求解变量较多的情况下可能面临收敛麻烦,本算法将整个优化问题分解为两个子问题(计算卸载和能量传输、计算资源和能量分配)并分别进行求解,有效降低了算法复杂度。通过改进深度学习算法的优化动作变量生成策略和经验采样策略,使所提出的算法易收敛性,并在具有大规模调度变量的MEC网络中获得接近最优的计算卸载策略。This application uses a brand-new computing offloading method for mobile edge computing networks. The proposed RLCOET algorithm can obtain an efficient offloading strategy by learning and interacting with wireless user equipment movement in a dynamic edge computing network environment. Compared with traditional optimization methods, the method of the present application alleviates the requirement of solving scheduling optimization through repeated iterative calculations, and enables all tasks to obtain satisfactory calculation delay and lower energy consumption. Compared with most of the existing learning-based methods, all scheduling variables are optimized together, which may face convergence troubles when there are many variables to be solved. This algorithm decomposes the entire optimization problem into two sub-problems (computation offloading and energy transfer , computing resources and energy allocation) and solve them separately, which effectively reduces the complexity of the algorithm. By improving the optimal action variable generation strategy and experience sampling strategy of the deep learning algorithm, the proposed algorithm is easy to converge, and a near-optimal computation offloading strategy is obtained in MEC networks with large-scale scheduling variables.
根据本申请的一个方面的任务卸载方法是基于移动边缘计算网络的。然而,当网络基础设施不可用(如发生自然灾害的救援现场)、网络设备稀疏分布(如野外作业环境)或面对临时激增的移动设备并远远超出网络服务能力时(如大型比赛或集会),鉴于无人机(Unmanned Aerial Vehicles,UAVs)具有高机动性和灵活性,因此可以采用无人机作为通信中继站或边缘计算平台。近年来研究人员通过在无人机部署相关的无线通信节点,建立与用户的移动设备(Mobile Devices,MDs)的通信关系,提出了使用无人机在多种应用场景下辅助移动边缘计算(Mobile Edge Computing,MEC)的技术。无人机部署了计算资源后,无人机辅助的移动边缘计算网络将带来很多优势,如降低网络开销、降低计算任务执行延迟、更好的体验质量(QoE)、延长移动设备的电池寿命等。The task offloading method according to one aspect of the present application is based on a mobile edge computing network. However, when the network infrastructure is unavailable (such as a natural disaster rescue site), network equipment is sparsely distributed (such as a field operation environment), or when facing a temporary surge of mobile devices far beyond the network service capacity (such as a large game or rally ), in view of the high maneuverability and flexibility of UAVs (Unmanned Aerial Vehicles, UAVs), UAVs can be used as communication relay stations or edge computing platforms. In recent years, researchers have established a communication relationship with users' mobile devices (Mobile Devices, MDs) by deploying relevant wireless communication nodes on UAVs, and proposed the use of UAVs to assist mobile edge computing (Mobile Edge Computing) in various application scenarios. Edge Computing, MEC) technology. After the computing resources are deployed by drones, the drone-assisted mobile edge computing network will bring many advantages, such as reducing network overhead, reducing computing task execution latency, better quality of experience (QoE), and extending battery life of mobile devices Wait.
在无人机辅助的移动边缘计算领域,需要对无人机的运动轨迹和移动边缘计算网络中计算任务卸载情况(计算任务是在移动设备本地执行,还是卸载到边缘服务器端执行)进行恰当的决策以获得理想的性能。具体来说,现有的研究与发明通过优化无人机的轨迹、任务卸载比例和任务调度情况实现对所有移动设备计算延迟或能耗的最小化,保障整个边缘计算网络的可靠性。In the field of UAV-assisted mobile edge computing, it is necessary to properly determine the trajectory of the UAV and the offloading of computing tasks in the mobile edge computing network (whether the computing task is executed locally on the mobile device or offloaded to the edge server). decision to obtain desired performance. Specifically, existing research and inventions minimize the computing delay or energy consumption of all mobile devices by optimizing the UAV trajectory, task offloading ratio, and task scheduling to ensure the reliability of the entire edge computing network.
现有的无人机辅助的边缘计算系统往往只使用一个或多个无人机作为边缘计算设备保证网络系统计算任务传输的低延迟与可靠性。由于当前无人机技术发展的局限性以及无人机中部署计算设备的计算能力较弱,单纯使用无人机辅助的边缘计算网络不足以为多个移动设备都提供令人满意的服务。Existing UAV-assisted edge computing systems often only use one or more UAVs as edge computing devices to ensure low latency and reliability of network system computing task transmission. Due to the limitations of the current development of UAV technology and the weak computing power of computing devices deployed in UAVs, it is not enough to use UAV-assisted edge computing networks to provide satisfactory services for multiple mobile devices.
因此,一个更有前景的模式是在移动设备、无人机和蜂窝网络基站(cellular base stations,BS)三者之间实现移动边缘计算网络的搭建。然而,现有的一些由移动设备、无人机和基站组成的边缘计算网络中只包含一个无人机,由于该无人机既作为边缘服务的计算设备又作为中继任务转发设备,这会导致无法同时满足多个移动设备的计算任务需求,增加网络系统的任务计算时延。Therefore, a more promising model is to realize the construction of mobile edge computing network among mobile devices, drones and cellular network base stations (cellular base stations, BS). However, some existing edge computing networks composed of mobile devices, UAVs and base stations only contain one UAV. As a result, the computing task requirements of multiple mobile devices cannot be satisfied at the same time, and the task computing delay of the network system is increased.
下面结合附图描述在移动设备、无人机和蜂窝网络基站三者之间实现移动边缘计算网络的情况下,根据本申请的另一方面的数据处理系统、调度优化系统和调度优化方法。The following describes a data processing system, a scheduling optimization system, and a scheduling optimization method according to another aspect of the present application in the case of implementing a mobile edge computing network among mobile devices, drones, and cellular network base stations with reference to the accompanying drawings.
图8为本申请的另一些实施例提供的数据处理系统10的结构框图,其提供了一种数据处理系统10可能的实现方式,参见图8,该数据处理系统10可以包括电子设备100、调度优化系统300中的一种或多种。FIG. 8 is a structural block diagram of a data processing system 10 provided by other embodiments of the present application, which provides a possible implementation of the data processing system 10. Referring to FIG. 8, the data processing system 10 may include an electronic device 100, a scheduling One or more of system 300 are optimized.
其中,电子设备100与调度优化系统300通信连接,电子设备100获取调度优化系统300的待处理任务和位置,根据待处理任务和位置得到调度策略,以使调度优化系统300根据调度策略进行调度优化处理。Wherein, the electronic device 100 communicates with the scheduling optimization system 300, and the electronic device 100 obtains the tasks and locations to be processed by the scheduling optimization system 300, and obtains a scheduling strategy according to the tasks and locations to be processed, so that the scheduling optimization system 300 can perform scheduling optimization according to the scheduling strategy deal with.
可选地,调度优化系统300的具体组成不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,调度优化系统300可以包括至少一个基站、无人机和移动设备。Optionally, the specific composition of the scheduling optimization system 300 is not limited, and can be set according to actual application requirements. For example, in an alternative example, the scheduling optimization system 300 may include at least one base station, a drone, and a mobile device.
需要说明的是,在一种可以替代的示例中,电子设备100和移动设备可以为同一设备;在另一种可以替代的示例中,电子设备100和无人机可以为同一设备;在另一种可以替代的示例中,电子设备100和基站可以为同一设备。It should be noted that, in an alternative example, the electronic device 100 and the mobile device may be the same device; in another alternative example, the electronic device 100 and the drone may be the same device; in another In an alternative example, the electronic device 100 and the base station may be the same device.
可选地,基站的数量不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,基站的数量可以为一个。Optionally, the number of base stations is not limited, and can be set according to actual application requirements. For example, in an alternative example, the number of base stations may be one.
也就是说,为了解决由移动设备、无人机和基站组成的边缘计算网络任务计算延迟高、无法同时满足多个具有计算任务需求的移动设备的问题,结合图9,本申请建立了一个由单个基站、多个无人机和大量的移动设备组成的移动边缘计算网络。网络中移动设备产生的计算任务既可以在移动设备本身执行,也可以卸载到其中一个无人机上进行简单的计算,或者进一步传输到基站进行更密集的计算。That is to say, in order to solve the problem that the task calculation delay of the edge computing network composed of mobile devices, UAVs and base stations is high, and it cannot satisfy multiple mobile devices with computing task requirements at the same time, in combination with Figure 9, this application establishes a network consisting of A mobile edge computing network composed of a single base station, multiple drones, and a large number of mobile devices. Computational tasks generated by mobile devices in the network can either be performed on the mobile device itself, offloaded to one of the drones for simple calculations, or further transmitted to the base station for more intensive calculations.
图10示出了本申请实施例所提供的调度优化方法的流程图之一,该方法可应用于图19所示的电子设备100(下文中描述),由图19中的电子设备100执行。可以理解,根据本申请的实施方式的调度优化装置可以通过根据本申请的一些实施方式所述的任务卸载装置来实现。此外,应当理解,在其他实施例中,本实施例的调度优化方法中的部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除。下面对图10所示的调度优化方法的流程进行详细描述。FIG. 10 shows one of the flowcharts of the scheduling optimization method provided by the embodiment of the present application. The method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 . It can be understood that the scheduling optimization device according to the embodiments of the present application may be implemented by the task offloading device according to some embodiments of the present application. In addition, it should be understood that in other embodiments, the order of some steps in the scheduling optimization method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted. The flow of the scheduling optimization method shown in FIG. 10 will be described in detail below.
步骤S410,获取至少一个移动设备的待处理任务和当前位置信息。Step S410, acquiring the pending tasks and current location information of at least one mobile device.
其中,待处理任务包括第一任务和第二任务。Wherein, the tasks to be processed include the first task and the second task.
步骤S420,将待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略。In step S420, the task to be processed and the current location information are input into a preset scheduling optimization model to obtain a scheduling strategy.
其中,调度优化模型基于建立的初始模型进行训练得到。Wherein, the scheduling optimization model is obtained by training based on the established initial model.
步骤S430,将调度策略发送至至少一个移动设备,以使至少一个移动设备基于调度策略将第一任务发送至至少一个无人机进行处理,将第二任务通过至少一个无人机转发至至少一个基站进行处理。Step S430, sending the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one UAV through at least one UAV. base station for processing.
上述方法通过将待处理任务和当前位置信息输入预设的调度优化模型得到调度策略,将调度策略发送至至少一个移动设备,以使至少一个移动设备基于调度策略将第一任务发送至至少一个无人机进行处理,将第二任务通过至少一个无人机转发至至少一个基站进行处理,实现了将第一任务调度到无人机上进行处理,将第二任务调度到基站进行处理,避免了相关技术中任务要么全部在移动设备本地执行,要么全部调度到无人机或基站上远程执行,所导致的调度优化的效率低的问题。The above method obtains a scheduling strategy by inputting the pending tasks and current location information into a preset scheduling optimization model, and sends the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one mobile device based on the scheduling strategy. Man-machine processing, the second task is forwarded to at least one base station for processing through at least one UAV, and the first task is dispatched to the UAV for processing, and the second task is dispatched to the base station for processing, avoiding correlation In the technology, the tasks are all executed locally on the mobile device, or they are all dispatched to the UAV or the base station for remote execution, which leads to the problem of low efficiency of scheduling optimization.
需要说明的是,在步骤S410之前,本申请实施例提供的调度优化方法还可以包括获取调度优化模型的步骤,结合图11,该步骤可以包括以下子步骤:It should be noted that before step S410, the scheduling optimization method provided by the embodiment of the present application may also include the step of obtaining a scheduling optimization model. Referring to FIG. 11, this step may include the following sub-steps:
步骤S440,根据移动边缘计算网络系统的初始参数建立初始模型和优化目标函数。Step S440, establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system.
步骤S450,根据优化目标函数对初始模型进行训练,得到调度优化模型。In step S450, the initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
对于步骤S440,需要说明的是,建立初始模型和优化目标函数的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,步骤S440可以包括以下子步骤:Regarding step S440, it should be noted that the specific ways of establishing the initial model and optimizing the objective function are not limited, and can be set according to actual application requirements. For example, in an alternative example, step S440 may include the following sub-steps:
根据至少一个基站、无人机和移动设备的初始参数建立初始模型;根据初始模型建立优化目标函数。An initial model is established according to the initial parameters of at least one base station, unmanned aerial vehicle and mobile device; an optimization objective function is established according to the initial model.
其中,初始模型可以包括移动边缘计算网络系统的系统模型、计算模型和通信模型,建立初始模型的步骤可以包括如下几个子步骤:Among them, the initial model may include the system model, calculation model and communication model of the mobile edge computing network system, and the step of establishing the initial model may include the following sub-steps:
1.建立系统模型:1. Establish a system model:
本申请建立的系统模型的网络架构主要分为三层,地面的移动设备、在空中的无人机以及处在远端的基站,三者的位置可以使用三维笛卡尔坐标系进行表示。待处理任务的总执行时间记为T,其被平均的划分为N个时间片,时间片集合可以表示为:The network architecture of the system model established in this application is mainly divided into three layers, mobile devices on the ground, drones in the air, and remote base stations. The positions of the three can be represented by a three-dimensional Cartesian coordinate system. The total execution time of the task to be processed is recorded as T, which is evenly divided into N time slices, and the time slice set can be expressed as:
Figure PCTCN2022091260-appb-000101
Figure PCTCN2022091260-appb-000101
其中,每一个时间片的长度τ,满足τ=T/N,并假设每个时间片足够小以至于每个无人机在时间片内的位置不变,在考虑计算任务可能存在堵塞的情况下,本网络系统假设移动设备无法直接与基站通信,只能在无人机的帮助下将任务卸载到基站。Among them, the length τ of each time slice satisfies τ=T/N, and assuming that each time slice is small enough that the position of each UAV in the time slice remains unchanged, considering that there may be congestion in computing tasks Next, this network system assumes that mobile devices cannot directly communicate with the base station, and can only offload tasks to the base station with the help of drones.
在网络系统中,移动设备组成的集合可以表示为:In a network system, a collection of mobile devices can be expressed as:
Figure PCTCN2022091260-appb-000102
Figure PCTCN2022091260-appb-000102
其中,M表示移动设备的数量,在时间片TS m移动设备MD m的位置可以表示为: Among them, M represents the number of mobile devices, and the position of mobile device MD m in the time slice TS m can be expressed as:
Figure PCTCN2022091260-appb-000103
Figure PCTCN2022091260-appb-000103
其中,
Figure PCTCN2022091260-appb-000104
Figure PCTCN2022091260-appb-000105
表示移动设备MD m所处水平平面的坐标,
Figure PCTCN2022091260-appb-000106
in,
Figure PCTCN2022091260-appb-000104
and
Figure PCTCN2022091260-appb-000105
Indicates the coordinates of the horizontal plane where the mobile device MD m is located,
Figure PCTCN2022091260-appb-000106
在时间片TS m,每个移动设备MD m会产生一个计算密集型任务,任务可以表示为: In the time slice TS m , each mobile device MD m will generate a computationally intensive task, which can be expressed as:
Figure PCTCN2022091260-appb-000107
Figure PCTCN2022091260-appb-000107
其中,
Figure PCTCN2022091260-appb-000108
表示当前任务
Figure PCTCN2022091260-appb-000109
数据的大小(单位:bit),
Figure PCTCN2022091260-appb-000110
表示CPU处理每个bit花费的周期数,T req表示当前任务
Figure PCTCN2022091260-appb-000111
允许执行的最大时间。不失一般性,所有任务的最大允许执行时间相同。此外,T req值的小于τ以保证每个任务均能在一个时间片执行完。
in,
Figure PCTCN2022091260-appb-000108
Indicates the current task
Figure PCTCN2022091260-appb-000109
The size of the data (unit: bit),
Figure PCTCN2022091260-appb-000110
Indicates the number of cycles spent by the CPU processing each bit, and T req indicates the current task
Figure PCTCN2022091260-appb-000111
The maximum time allowed for execution. Without loss of generality, the maximum allowed execution time is the same for all tasks. In addition, the value of T req is smaller than τ to ensure that each task can be executed in one time slice.
每个移动设备MD m中都嵌入了一颗板载的CPU,其最大的计算频率可以用
Figure PCTCN2022091260-appb-000112
表示。通过动态调整CPU的电压与频率,在时间片TS n,移动设备MD m实际的CPU频率
Figure PCTCN2022091260-appb-000113
能够实现自适应控制,以提高能源的利用效率,因此,
Figure PCTCN2022091260-appb-000114
应满足:
An onboard CPU is embedded in each mobile device MD m , and its maximum computing frequency can be used
Figure PCTCN2022091260-appb-000112
express. By dynamically adjusting the voltage and frequency of the CPU, in the time slice TS n , the actual CPU frequency of the mobile device MD m
Figure PCTCN2022091260-appb-000113
Able to realize adaptive control to improve energy utilization efficiency, therefore,
Figure PCTCN2022091260-appb-000114
Should meet:
Figure PCTCN2022091260-appb-000115
Figure PCTCN2022091260-appb-000115
其中假设所有的移动设备都有相同的最大计算能力
Figure PCTCN2022091260-appb-000116
which assumes that all mobile devices have the same maximum computing power
Figure PCTCN2022091260-appb-000116
在本系统中,无人机组成的集合可以表示为:In this system, the set of drones can be expressed as:
Figure PCTCN2022091260-appb-000117
Figure PCTCN2022091260-appb-000117
其中,U表示无人机的数量,时间片TS n无人机UAV u的位置可以表示为: Among them, U represents the number of UAVs, and the position of UAV u in time slice TS n can be expressed as:
Figure PCTCN2022091260-appb-000118
Figure PCTCN2022091260-appb-000118
其中,
Figure PCTCN2022091260-appb-000119
Figure PCTCN2022091260-appb-000120
表示无人机UAV u所处水平平面的坐标,
Figure PCTCN2022091260-appb-000121
H表示无人机所处的高度。
in,
Figure PCTCN2022091260-appb-000119
and
Figure PCTCN2022091260-appb-000120
Indicates the coordinates of the horizontal plane where the UAV u is located,
Figure PCTCN2022091260-appb-000121
H represents the height of the drone.
假设每个无人机的最大飞行速度不超过V max,即可以表示为: Assuming that the maximum flight speed of each UAV does not exceed V max , it can be expressed as:
Figure PCTCN2022091260-appb-000122
Figure PCTCN2022091260-appb-000122
其中,v u(n)表示在时间片TS n无人机UAV u的速度。此外,为了保证无人机的飞行安全,任何两架无人机之间的距离应大于最小允许距离d min,即: where v u (n) denotes the velocity of UAV u in time slice TS n . In addition, in order to ensure the flight safety of drones, the distance between any two drones should be greater than the minimum allowable distance d min , namely:
Figure PCTCN2022091260-appb-000123
Figure PCTCN2022091260-appb-000123
在时间片TS n无人机UAV u产生的能耗可以表示为: The energy consumption of UAV u in time slice TS n can be expressed as:
Figure PCTCN2022091260-appb-000124
Figure PCTCN2022091260-appb-000124
其中,
Figure PCTCN2022091260-appb-000125
M g表示UAV u的重量。
in,
Figure PCTCN2022091260-appb-000125
M g denotes the weight of the UAV u .
每个无人机可以被部署为一个边缘服务器,其最大计算能力记为
Figure PCTCN2022091260-appb-000126
在时间片TS n,对于决定要上传到无人机并进行执行的计算任务,无人机UAV u分配的CPU计算资源可以表示为
Figure PCTCN2022091260-appb-000127
并且满足:
Each UAV can be deployed as an edge server, and its maximum computing power is recorded as
Figure PCTCN2022091260-appb-000126
In the time slice TS n , for the computing tasks that are determined to be uploaded to the UAV and executed, the CPU computing resources allocated by the UAV u can be expressed as
Figure PCTCN2022091260-appb-000127
and satisfy:
Figure PCTCN2022091260-appb-000128
Figure PCTCN2022091260-appb-000128
假设所有无人机都有相同的最大计算能力
Figure PCTCN2022091260-appb-000129
Assume all drones have the same maximum computing power
Figure PCTCN2022091260-appb-000129
基站的位置可以表示为:The location of the base station can be expressed as:
Figure PCTCN2022091260-appb-000130
Figure PCTCN2022091260-appb-000130
其中,x BS与y BS表示基站所处水平平面的坐标。由于基站与无人机所处的高度高,基站与无人机通过视距无线传输链路连接而与移动设备不直接连接。在这种情况下,无人机充当中继转发设备,将移动设备卸载的任务转发到基站进行进一步计算。由于基站具有强大的计算服务器和能量供应,因此计算任务在基站的执行时间可以忽略不计,且不考虑在基站上执行的所有任务的能量消耗。 Wherein, x BS and y BS represent the coordinates of the horizontal plane where the base station is located. Due to the high height of the base station and the drone, the base station and the drone are connected through a line-of-sight wireless transmission link and are not directly connected to the mobile device. In this case, the UAV acts as a relay forwarding device, forwarding the tasks offloaded by the mobile device to the base station for further calculation. Since the base station has a powerful computing server and energy supply, the execution time of the computing task at the base station is negligible, and the energy consumption of all tasks performed on the base station is not considered.
本系统的所有计算任务的卸载方式都遵循完全卸载的方式,即每一个计算任务要么完全在本地执行,要么完全卸载到无人机UAV u上执行,要么进一步完全卸载到基站执行。使用任务调度决策变量
Figure PCTCN2022091260-appb-000131
表示计算任务
Figure PCTCN2022091260-appb-000132
的卸载情况:
The offloading method of all computing tasks in this system follows the method of complete offloading, that is, each computing task is either completely executed locally, or completely offloaded to the UAV u , or further completely offloaded to the base station for execution. Using Task Scheduling Decision Variables
Figure PCTCN2022091260-appb-000131
represent computing tasks
Figure PCTCN2022091260-appb-000132
Uninstallation of:
Figure PCTCN2022091260-appb-000133
Figure PCTCN2022091260-appb-000133
其中,
Figure PCTCN2022091260-appb-000134
表示计算任务
Figure PCTCN2022091260-appb-000135
要卸载到计算平台k。
in,
Figure PCTCN2022091260-appb-000134
represent computing tasks
Figure PCTCN2022091260-appb-000135
To offload to computing platform k.
值得注意的是,当任务
Figure PCTCN2022091260-appb-000136
在移动设备MD m或无人机上执行时,只有一个
Figure PCTCN2022091260-appb-000137
的值为1,其他值均为0,即
Figure PCTCN2022091260-appb-000138
Figure PCTCN2022091260-appb-000139
当任务
Figure PCTCN2022091260-appb-000140
在基站上执行时,除了
Figure PCTCN2022091260-appb-000141
外,其卸载到的对应的无人机也需要为1,即
Figure PCTCN2022091260-appb-000142
因为其中一架无人机应该作为从移动设备到基站的中继。综上,变量
Figure PCTCN2022091260-appb-000143
应满足以下约束条件:
It is worth noting that when the task
Figure PCTCN2022091260-appb-000136
When executed on a mobile device MD m or drone, there is only one
Figure PCTCN2022091260-appb-000137
The value of is 1, and the other values are 0, that is,
Figure PCTCN2022091260-appb-000138
or
Figure PCTCN2022091260-appb-000139
When task
Figure PCTCN2022091260-appb-000140
When executed on the base station, except
Figure PCTCN2022091260-appb-000141
In addition, the corresponding drone to which it is unloaded also needs to be 1, that is
Figure PCTCN2022091260-appb-000142
Because one of the drones is supposed to act as a relay from the mobile to the base station. In summary, variable
Figure PCTCN2022091260-appb-000143
The following constraints should be met:
Figure PCTCN2022091260-appb-000144
Figure PCTCN2022091260-appb-000144
另外,假设每架无人机在每个时间片最多可以将一个任务卸载到BS继续执行,因此,
Figure PCTCN2022091260-appb-000145
应该满足:
In addition, it is assumed that each UAV can offload at most one task to the BS to continue execution in each time slice. Therefore,
Figure PCTCN2022091260-appb-000145
Should satisfy:
Figure PCTCN2022091260-appb-000146
Figure PCTCN2022091260-appb-000146
其中,
Figure PCTCN2022091260-appb-000147
in,
Figure PCTCN2022091260-appb-000147
需要补充说明的是,由于引入变量
Figure PCTCN2022091260-appb-000148
移动设备与无人机所分配的计算资源的约束条件变为:
What needs to be added is that due to the introduction of variables
Figure PCTCN2022091260-appb-000148
The constraints on the computing resources allocated by mobile devices and UAVs become:
Figure PCTCN2022091260-appb-000149
Figure PCTCN2022091260-appb-000149
Figure PCTCN2022091260-appb-000150
Figure PCTCN2022091260-appb-000150
2.建立计算模型:2. Establish a calculation model:
计算任务可以在移动设备、无人机和基站中执行,因此分别可以称为本地计算、无人机端计算和BS端计算。如果任务
Figure PCTCN2022091260-appb-000151
选择在本地进行计算,也就是
Figure PCTCN2022091260-appb-000152
那么,任务的计算时间为:
Computing tasks can be performed in mobile devices, drones, and base stations, so they can be called local computing, drone-side computing, and BS-side computing, respectively. if task
Figure PCTCN2022091260-appb-000151
Choose to compute locally, that is,
Figure PCTCN2022091260-appb-000152
Then, the calculation time of the task is:
Figure PCTCN2022091260-appb-000153
Figure PCTCN2022091260-appb-000153
消耗的能量为:The energy consumed is:
Figure PCTCN2022091260-appb-000154
Figure PCTCN2022091260-appb-000154
其中,κ m和v m是取决于移动设备MD m中CPU的正系数。 where κ m and v m are positive coefficients depending on the CPU in mobile device MD m .
如果计算任务
Figure PCTCN2022091260-appb-000155
选择卸载到无人机UAV u上执行,也就是
Figure PCTCN2022091260-appb-000156
任务的计算时间为:
if computing tasks
Figure PCTCN2022091260-appb-000155
Choose to offload to the drone UAV u to execute, that is
Figure PCTCN2022091260-appb-000156
The calculation time of the task is:
Figure PCTCN2022091260-appb-000157
Figure PCTCN2022091260-appb-000157
其中,in,
Figure PCTCN2022091260-appb-000158
Figure PCTCN2022091260-appb-000158
对应消耗的能量为:The corresponding energy consumption is:
Figure PCTCN2022091260-appb-000159
Figure PCTCN2022091260-appb-000159
其中,in,
Figure PCTCN2022091260-appb-000160
Figure PCTCN2022091260-appb-000160
其中,κ m和v m是取决于无人机UAV u中CPU的正系数,值得注意的是,每个计算任务
Figure PCTCN2022091260-appb-000161
只能卸载到其中一个无人机中。
where κ m and v m are positive coefficients depending on the CPU in UAV u , it is worth noting that each computation task
Figure PCTCN2022091260-appb-000161
Can only be unloaded into one of the drones.
如果任务
Figure PCTCN2022091260-appb-000162
在基站执行,也就是
Figure PCTCN2022091260-appb-000163
根据基站强大的计算能力和能源供应能力的假设,该任务的执行时间近似为零,且不考虑任务产生的能耗。
if task
Figure PCTCN2022091260-appb-000162
Executed at the base station, that is
Figure PCTCN2022091260-appb-000163
According to the assumption of the strong computing power and energy supply capability of the base station, the execution time of this task is approximately zero, and the energy consumption generated by the task is not considered.
3.建立通信模型:3. Establish a communication model:
整个网络系统的通信链路分为两种:移动设备与无人机的通信链路和无人机与基站的通信链路。为了避免无人机之间可能存在的通信干扰,各个无人机分配了正交通信频率,由于无人机的高度较高,无人机与移动设备或基站之间的无线通信信道,主要以视距无线传输为主。The communication link of the entire network system is divided into two types: the communication link between the mobile device and the UAV, and the communication link between the UAV and the base station. In order to avoid possible communication interference between UAVs, each UAV is assigned an orthogonal communication frequency. Due to the high altitude of UAVs, the wireless communication channel between UAVs and mobile devices or base stations is mainly based on Mainly line-of-sight wireless transmission.
在时间片TS n,移动设备MD m和无人机UAV u之间的距离为: In time slice TS n , the distance between mobile device MD m and UAV u is:
Figure PCTCN2022091260-appb-000164
Figure PCTCN2022091260-appb-000164
在时间片TS n,无人机UAV u和基站之间的距离为: In the time slice TS n , the distance between the UAV u and the base station is:
Figure PCTCN2022091260-appb-000165
Figure PCTCN2022091260-appb-000165
因此,移动设备MD m和无人机UAV u之间的无线信道增益为: Therefore, the wireless channel gain between the mobile device MD m and the UAV u is:
Figure PCTCN2022091260-appb-000166
Figure PCTCN2022091260-appb-000166
无人机UAV u和基站之间的无线信道增益为: The wireless channel gain between the UAV u and the base station is:
Figure PCTCN2022091260-appb-000167
Figure PCTCN2022091260-appb-000167
其中,g o为1米参考距离处的接收功率增益。 Among them, g o is the received power gain at the reference distance of 1 meter.
如果计算任务
Figure PCTCN2022091260-appb-000168
选择从移动设备MD m卸载到无人机UAV u,任务数据的传输速率为:
if computing tasks
Figure PCTCN2022091260-appb-000168
Choose to offload from mobile device MD m to unmanned aerial vehicle UAV u , the transmission rate of mission data is:
Figure PCTCN2022091260-appb-000169
Figure PCTCN2022091260-appb-000169
如果计算任务
Figure PCTCN2022091260-appb-000170
选择从无人机UAV u卸载到基站,任务数据的传输速率为:
if computing tasks
Figure PCTCN2022091260-appb-000170
Choose to offload from UAV u to the base station, the transmission rate of mission data is:
Figure PCTCN2022091260-appb-000171
Figure PCTCN2022091260-appb-000171
其中,B表示网络系统的带宽,
Figure PCTCN2022091260-appb-000172
Figure PCTCN2022091260-appb-000173
分别表示在时间片TS n移动设备MD m与无人机UAV u的无线传输功率,σ 2表示通信噪声频率,
Figure PCTCN2022091260-appb-000174
Figure PCTCN2022091260-appb-000175
分别满足如下条件:
Among them, B represents the bandwidth of the network system,
Figure PCTCN2022091260-appb-000172
and
Figure PCTCN2022091260-appb-000173
Respectively represent the wireless transmission power of the mobile device MD m and the UAV u in the time slice TS n , σ 2 represents the communication noise frequency,
Figure PCTCN2022091260-appb-000174
and
Figure PCTCN2022091260-appb-000175
The following conditions are met respectively:
Figure PCTCN2022091260-appb-000176
Figure PCTCN2022091260-appb-000176
其中,
Figure PCTCN2022091260-appb-000177
Figure PCTCN2022091260-appb-000178
分别表示移动设备MD m与无人机UAV u的最大可用传输功率。
in,
Figure PCTCN2022091260-appb-000177
and
Figure PCTCN2022091260-appb-000178
Respectively represent the maximum available transmission power of the mobile device MD m and the UAV u .
移动设备MD m将计算任务卸载到无人机UAV u需要的时间及消耗的能量分别为: The time and energy consumed by the mobile device MD m to offload computing tasks to the UAV u are:
Figure PCTCN2022091260-appb-000179
Figure PCTCN2022091260-appb-000179
Figure PCTCN2022091260-appb-000180
Figure PCTCN2022091260-appb-000180
无人机UAV u将计算任务卸载到基站需要的时间及消耗的能量分别为: The time and energy consumed by the unmanned aerial vehicle UAV u to offload computing tasks to the base station are:
Figure PCTCN2022091260-appb-000181
Figure PCTCN2022091260-appb-000181
Figure PCTCN2022091260-appb-000182
Figure PCTCN2022091260-appb-000182
Figure PCTCN2022091260-appb-000183
Figure PCTCN2022091260-appb-000184
分别表示移动设备MD m与无人机UAV u的能量预算,并对于
Figure PCTCN2022091260-appb-000185
满足以下限制条件:
make
Figure PCTCN2022091260-appb-000183
and
Figure PCTCN2022091260-appb-000184
represent the energy budgets of the mobile device MD m and the UAV u , respectively, and for
Figure PCTCN2022091260-appb-000185
The following constraints are met:
Figure PCTCN2022091260-appb-000186
Figure PCTCN2022091260-appb-000186
Figure PCTCN2022091260-appb-000187
Figure PCTCN2022091260-appb-000187
本网络系统的优化目标是在任务时延约束和系统约束下(如无人机最大速度、无人机间最小距离和最大计算能力),最小化移动设备和无人机的总能量消耗。计算任务
Figure PCTCN2022091260-appb-000188
在移动设备、无人机或基站执行时,对应的任务时延的分别表示如下:
The optimization goal of this network system is to minimize the total energy consumption of mobile devices and UAVs under task delay constraints and system constraints (such as the maximum speed of UAVs, the minimum distance between UAVs and the maximum computing power). computing tasks
Figure PCTCN2022091260-appb-000188
When mobile devices, UAVs or base stations are executed, the corresponding task delays are expressed as follows:
Figure PCTCN2022091260-appb-000189
Figure PCTCN2022091260-appb-000189
当引入任务调度决策变量
Figure PCTCN2022091260-appb-000190
后,计算任务
Figure PCTCN2022091260-appb-000191
可以统一表示为:
When the task scheduling decision variable is introduced
Figure PCTCN2022091260-appb-000190
After that, the calculation task
Figure PCTCN2022091260-appb-000191
can be uniformly expressed as:
Figure PCTCN2022091260-appb-000192
Figure PCTCN2022091260-appb-000192
因此,对应的任务执行时延约束为:Therefore, the corresponding task execution delay constraint is:
Figure PCTCN2022091260-appb-000193
Figure PCTCN2022091260-appb-000193
在时间片TS n,执行任务
Figure PCTCN2022091260-appb-000194
产生的能耗可以分为两类:
In time slice TS n , execute the task
Figure PCTCN2022091260-appb-000194
The resulting energy consumption can be divided into two categories:
1)如果任务
Figure PCTCN2022091260-appb-000195
在移动设备MD m本地执行,即
Figure PCTCN2022091260-appb-000196
那么移动设备MD m的能量消耗为
Figure PCTCN2022091260-appb-000197
1) If the task
Figure PCTCN2022091260-appb-000195
Executed locally on the mobile device MD m , ie
Figure PCTCN2022091260-appb-000196
Then the energy consumption of the mobile device MD m is
Figure PCTCN2022091260-appb-000197
2)如果任务
Figure PCTCN2022091260-appb-000198
被卸载到无人机UAV u或基站执行,即
Figure PCTCN2022091260-appb-000199
那么移动设备MD m的能量消耗为
Figure PCTCN2022091260-appb-000200
2) If the task
Figure PCTCN2022091260-appb-000198
be offloaded to UAV u or base station for execution, i.e.
Figure PCTCN2022091260-appb-000199
Then the energy consumption of the mobile device MD m is
Figure PCTCN2022091260-appb-000200
因此,移动设备MD m在执行计算任务
Figure PCTCN2022091260-appb-000201
消耗的能量可以统一表示为:
Therefore, the mobile device MD m is performing the computing task
Figure PCTCN2022091260-appb-000201
The energy consumed can be uniformly expressed as:
Figure PCTCN2022091260-appb-000202
Figure PCTCN2022091260-appb-000202
所有移动设备在任务执行期间的能量消耗可以表示为:The energy consumption of all mobile devices during task execution can be expressed as:
Figure PCTCN2022091260-appb-000203
Figure PCTCN2022091260-appb-000203
综上,为了使移动边缘计算网络系统运行过程种移动设备所有任务的总能耗最小,定义优化问题(优化目标函数)如下:In summary, in order to minimize the total energy consumption of all tasks of mobile devices during the operation of the mobile edge computing network system, the optimization problem (optimization objective function) is defined as follows:
Figure PCTCN2022091260-appb-000204
Figure PCTCN2022091260-appb-000204
s.t.C1:Eq(1)and Eq(2),s.t. C1: Eq(1) and Eq(2),
C2:Eq(3),Eq(4)and Eq(5),C2: Eq(3), Eq(4) and Eq(5),
C3:Eq(6)and Eq(7),C3: Eq(6) and Eq(7),
C4:Eq(8),C4: Eq(8),
C5:Eq(9)and Eq(10),C5: Eq(9) and Eq(10),
C6:Eq(11),C6: Eq(11),
其中,
Figure PCTCN2022091260-appb-000205
Figure PCTCN2022091260-appb-000206
为待优化的变量。
in,
Figure PCTCN2022091260-appb-000205
Figure PCTCN2022091260-appb-000206
is the variable to be optimized.
在问题P中,限制条件C1表示无人机的最大速度和无人机之间的最小距离不应违反相应的限制。限制条件C2保证每个时间片在某一移动设备产生的计算任务只能在移动设备本地、无人机或基站三者中的某一设备上执行,且每个无人机在每个时间片最多只能向基站发送一个任务。限制条件C3保证每个时间片分配给本地计算和无人机计算的计算资源分别不应超过移动设备和无人机的最大计算能力。限制条件C4表示移动设备和无人机在执行期间不应超过其对应能量预算。限制条件C5表示移动设备和无人机分配的发射功率不能超过最大允许值。限制条件C6保证了每个任务执行应满足时延要求。In problem P, constraint C1 states that the maximum speed of the drones and the minimum distance between drones should not violate the corresponding constraints. Restriction C2 guarantees that the computing tasks generated by a certain mobile device in each time slice can only be executed on one of the local mobile device, UAV or base station, and each UAV can be executed in each time slice At most one task can be sent to the base station. Constraint C3 ensures that the computing resources allocated to local computing and UAV computing in each time slice should not exceed the maximum computing capabilities of mobile devices and UAVs respectively. Constraint C4 states that mobile devices and drones should not exceed their corresponding energy budgets during execution. Constraint C5 states that the transmit power allocated by mobile devices and drones cannot exceed the maximum allowable value. Constraint C6 ensures that the execution of each task should meet the delay requirement.
对于步骤S450,需要说明的是,训练模型的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,调度优化模型包括无人机轨迹规划模型、计算任务联合调度模型和资源分配模型,步骤S450可以包括以下子步骤:Regarding step S450, it should be noted that the specific manner of training the model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and step S450 may include the following sub-steps:
对优化目标函数进行拆分处理,得到第一优化目标函数、第二优化目标函数和第三优化目标函数;根据第一优化目标函数对初始模型进行训练,得到无人机轨迹规划模型,根据第二优化目标函数对初始模型进行训练,得到计算任务联合调度模型,根据第三优化目标函数对初始模型进行训练,得到资源分配模型。The optimization objective function is split and processed to obtain the first optimization objective function, the second optimization objective function and the third optimization objective function; the initial model is trained according to the first optimization objective function, and the UAV trajectory planning model is obtained. The second optimization objective function trains the initial model to obtain a joint scheduling model of computing tasks, and trains the initial model according to the third optimization objective function to obtain a resource allocation model.
详细地,问题P是一个难以解决的问题,主要原因有以下几个方面:1)由于A是离散的二元变量,且L,P,F为连续性变量,因此该问题为混合非线性整数规划问题,属于NP难问题;2)由于网络系统的快速响应要求,每个时间片调度优化算法应进行实时快速地调度决策;3)由于移动设备与无人机的位置均会变化,P需要能在动态变化的环境进行求解。基于上述原因,本申请将优化目标函数P分解为三个子问题,包括无人机轨迹规划(P1,即第一优化目标函数)、计算任务联合调度(P2,即第二优化目标函数)和计算与传输资源分配(P3,即第三优化目标函数),这样就可以得到一个高效的移动边缘计算网络的调度策略,大大降低优化问题的求解复杂度。In detail, the problem P is a difficult problem, the main reasons are as follows: 1) Since A is a discrete binary variable, and L, P, F are continuous variables, the problem is a mixed nonlinear integer The planning problem is an NP-hard problem; 2) Due to the fast response requirements of the network system, each time slice scheduling optimization algorithm should make real-time and fast scheduling decisions; 3) Since the positions of mobile devices and UAVs will change, P needs Can be solved in a dynamically changing environment. Based on the above reasons, this application decomposes the optimization objective function P into three sub-problems, including UAV trajectory planning (P1, the first optimization objective function), joint scheduling of computing tasks (P2, the second optimization objective function) and calculation And transmission resource allocation (P3, the third optimization objective function), so that an efficient mobile edge computing network scheduling strategy can be obtained, which greatly reduces the complexity of solving the optimization problem.
为了降低原优化问题的计算复杂度,将P拆分为以下三个子问题:In order to reduce the computational complexity of the original optimization problem, P is split into the following three sub-problems:
1.无人机轨迹规划:1. UAV trajectory planning:
在问题P中的优化调度变量L,A,P,F中,无人机的轨迹位置L与其他三个变量依赖性较弱,该变量的优化主要以移动设备的位置观测为基础,优化的目标是尽可能地与移动设备和基站靠近,因此,无人机的轨迹优化可以表示为:Among the optimal scheduling variables L, A, P, and F in the problem P, the trajectory position L of the UAV is weakly dependent on the other three variables. The optimization of this variable is mainly based on the position observation of the mobile device. The optimized The goal is to be as close as possible to the mobile device and the base station, therefore, the UAV trajectory optimization can be expressed as:
Figure PCTCN2022091260-appb-000207
Figure PCTCN2022091260-appb-000207
s.t.C1:Eq(1)and Eq(2),;s.t. C1: Eq(1) and Eq(2), ;
其中,
Figure PCTCN2022091260-appb-000208
表示在无人机UAV u提供服务范围内的移动设备组成的集群,并满足条件
Figure PCTCN2022091260-appb-000209
in,
Figure PCTCN2022091260-appb-000208
Indicates a cluster composed of mobile devices within the service range provided by UAV u , and meets the conditions
Figure PCTCN2022091260-appb-000209
2.计算任务联合调度:2. Joint scheduling of computing tasks:
一旦在时间片TS n确定了无人机的位置L后,任务卸载决策变量A需在优化变量P和F前进行优化。基于当前的移动设备集群
Figure PCTCN2022091260-appb-000210
以最小化所有任务的最大计算时延
Figure PCTCN2022091260-appb-000211
为目标对A进行优化,使原问题P中的限制条件C6更容易得到满足,因此计算任务联合调度子问题可以表示为:
Once the position L of the UAV is determined in the time slice TSn , the task offloading decision variable A needs to be optimized before optimizing the variables P and F. Based on the current mobile device cluster
Figure PCTCN2022091260-appb-000210
To minimize the maximum computational delay of all tasks
Figure PCTCN2022091260-appb-000211
Optimizing A for the goal makes it easier to satisfy the constraint C6 in the original problem P, so the joint scheduling subproblem of computing tasks can be expressed as:
Figure PCTCN2022091260-appb-000212
Figure PCTCN2022091260-appb-000212
s.t.C2:Eq(3),Eq(4)and Eq(5),s.t.C2: Eq(3), Eq(4) and Eq(5),
3.计算与传输资源分配:3. Allocation of computing and transmission resources:
在求解了问题P1与问题P2后,在C3、C4、C5的约束下,其余变量P、F以最小化系统中消耗能量为目标进行如下优化:After solving problem P1 and problem P2, under the constraints of C3, C4 and C5, the remaining variables P and F are optimized as follows with the goal of minimizing energy consumption in the system:
Figure PCTCN2022091260-appb-000213
Figure PCTCN2022091260-appb-000213
s.t.C3:Eq(6)and Eq(7),s.t. C3: Eq(6) and Eq(7),
C4:Eq(8),C4: Eq(8),
C5:Eq(9)and Eq(10),C5: Eq(9) and Eq(10),
基于以上问题的分解,如图12所示为本申请提出的优化框架,本算法框架由无人机轨迹规划模型(UAV Trajectory Planning,UTP)、计算任务联合调度模型(Task Association Scheduling,TAS)和计算与传输资源分配模型(Resource Allocation,RA)三个模型组成,分别对应优化子问题P1,P2和P3。在每一个时间片的开始阶段,网络系统环境生成两个状态变量(
Figure PCTCN2022091260-appb-000214
Figure PCTCN2022091260-appb-000215
)。
Figure PCTCN2022091260-appb-000216
是UTP模型的输入,
Figure PCTCN2022091260-appb-000217
是TAS和RA模型的输入。
Based on the decomposition of the above problems, the optimization framework proposed by this application is shown in Figure 12. This algorithm framework consists of a UAV trajectory planning model (UAV Trajectory Planning, UTP), a computing task joint scheduling model (Task Association Scheduling, TAS) and The calculation and transmission resource allocation model (Resource Allocation, RA) consists of three models, which correspond to the optimization sub-problems P1, P2 and P3 respectively. At the beginning of each time slice, the network system environment generates two state variables (
Figure PCTCN2022091260-appb-000214
and
Figure PCTCN2022091260-appb-000215
).
Figure PCTCN2022091260-appb-000216
is the input of the UTP model,
Figure PCTCN2022091260-appb-000217
is the input of TAS and RA models.
1)UTP模型对
Figure PCTCN2022091260-appb-000218
进行处理,由于移动设备的位置在在不同时间片不同,UTP模型将对移动设备的运动进行预测,引导无人机运动到适当的位置。由于移动设备的运动方式既不符合高斯分布,也不符合线性分布,本申请可以采用长短期记忆网络来模拟移动设备的运动分布。预测完成后,需要根据无人机的数量将无人机适当地划分为U个集群,以便每个无人机能为该集群中的移动设备服务。为了进行软聚类,即每个移动设备可以在不同的时间片由不同的无人机提供服务(但在同一时间片由不超过一个无人机提供服务),UTP模型中采用了模糊C均值聚类的方法,根据信道功率增益的相似度进行聚类。聚类后,每个聚类的中心点作为UTP模块中无人机的运动位置的输出,即
Figure PCTCN2022091260-appb-000219
1) UTP model pairs
Figure PCTCN2022091260-appb-000218
For processing, since the location of the mobile device is different in different time slices, the UTP model will predict the movement of the mobile device and guide the UAV to move to an appropriate position. Since the motion mode of the mobile device neither conforms to the Gaussian distribution nor the linear distribution, this application can use the long short-term memory network to simulate the motion distribution of the mobile device. After the prediction is completed, the drones need to be properly divided into U clusters according to the number of drones, so that each drone can serve the mobile devices in the cluster. For soft clustering, i.e. each mobile device can be served by different drones in different time slices (but not more than one drone in the same time slice), the fuzzy C-means is adopted in the UTP model The clustering method performs clustering according to the similarity of channel power gains. After clustering, the center point of each cluster is used as the output of the movement position of the UAV in the UTP module, namely
Figure PCTCN2022091260-appb-000219
2)TAS模型分别从UTP模型和网络环境中接收
Figure PCTCN2022091260-appb-000220
Figure PCTCN2022091260-appb-000221
TAS模型根据时变的信道条件和计算任务要求,生成 任务调度决策变量
Figure PCTCN2022091260-appb-000222
的值。本申请可以使用先进的深度强化学习(DRL)方法:深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG),根据算法模型与环境的交互获得经验并输出优化后的决策动作a n。在其他可以替代的示例中,也可以使用其他适用于连续动作的强化学习算法(如TD3算法,PPO算法等)。对于每个时间片,输出的动作a n是一个一维向量,由
Figure PCTCN2022091260-appb-000223
项组成,其中每一项都被设置为0和1之间松弛的连续变量。a n的每一项都可以被视为
Figure PCTCN2022091260-appb-000224
在计算设备k上的执行概率(这就是每一项都设置为0和1之间连续值的原因)。由于任务调度决策变量应该是二维的、二进制的值,a n的所有项的值根据优化问题的任务关联约束条件被塑型和整合为1或0,并作为TAS模型的输出,即
Figure PCTCN2022091260-appb-000225
2) The TAS model is received from the UTP model and the network environment respectively
Figure PCTCN2022091260-appb-000220
and
Figure PCTCN2022091260-appb-000221
The TAS model generates task scheduling decision variables according to time-varying channel conditions and computing task requirements
Figure PCTCN2022091260-appb-000222
value. This application can use the advanced deep reinforcement learning (DRL) method: deep deterministic policy gradient algorithm (Deep Deterministic Policy Gradient, DDPG), according to the interaction between the algorithm model and the environment to obtain experience and output the optimized decision-making action a n . In other alternative examples, other reinforcement learning algorithms suitable for continuous actions (such as TD3 algorithm, PPO algorithm, etc.) can also be used. For each time slice, the output action a n is a one-dimensional vector given by
Figure PCTCN2022091260-appb-000223
items, each of which is set to be a continuous variable that is relaxed between 0 and 1. Each term of a n can be viewed as
Figure PCTCN2022091260-appb-000224
Compute the probability of execution on device k (this is why each item is set to a continuous value between 0 and 1). Since the task scheduling decision variables should be two-dimensional, binary values, the values of all items of a n are shaped and integrated as 1 or 0 according to the task association constraints of the optimization problem, and are used as the output of the TAS model, namely
Figure PCTCN2022091260-appb-000225
3)将
Figure PCTCN2022091260-appb-000226
Figure PCTCN2022091260-appb-000227
作为RA模型的输入进行最后的处理。根据子问题P3,优化变量P与F可以直接通过CVXPY凸优化工具包进行求解,RA模型输出的P与F与环境进行交互。
3) Will
Figure PCTCN2022091260-appb-000226
and
Figure PCTCN2022091260-appb-000227
As input to the RA model for final processing. According to the sub-problem P3, the optimization variables P and F can be solved directly through the CVXPY convex optimization toolkit, and the P and F output by the RA model interact with the environment.
环境接收上述3个模型输出的动作,环境接收该行动并产生一个奖励r n(作为DDPG的输入)和一个新的状态(状态对应的分量被发送到算法框架相应的组件)。此后,该算法进入下一个时间片,并重复上述三个步骤。 The environment receives the action output by the above three models, and the environment receives the action and generates a reward r n (as the input of DDPG) and a new state (the components corresponding to the state are sent to the corresponding components of the algorithm framework). Thereafter, the algorithm enters the next time slice and repeats the above three steps.
需要说明的是,可以通过长短期记忆网络和模糊C均值聚类的方法计算获得无人机所处的最优位置规划,无人机的轨迹规划可以分为移动设备运动预测和移动设备聚类两个部分。It should be noted that the optimal location plan of the UAV can be calculated by the method of long-term short-term memory network and fuzzy C-means clustering. The trajectory planning of UAV can be divided into mobile device motion prediction and mobile device clustering two parts.
在网络系统中,无人机和移动设备之间的距离是影响其他调度变量的主要因素,因此无人机所处的理想运动轨迹是逐渐向移动设备运动并尽可能地靠近移动设备。为此,本申请提出的算法预测移动设备的位置
Figure PCTCN2022091260-appb-000228
以辅助无人机的移动。由于对
Figure PCTCN2022091260-appb-000229
的预测主要是基于移动设备之前时间片的位置,所以本申请利用递归神经网络LSTM来模拟
Figure PCTCN2022091260-appb-000230
的时序分布。
In the network system, the distance between the UAV and the mobile device is the main factor affecting other scheduling variables, so the ideal trajectory of the UAV is to gradually move towards the mobile device and get as close as possible to the mobile device. To this end, the algorithm proposed in this application predicts the location of the mobile device
Figure PCTCN2022091260-appb-000228
To assist the movement of the drone. due to
Figure PCTCN2022091260-appb-000229
The prediction of is mainly based on the position of the mobile device in the previous time slice, so this application uses the recurrent neural network LSTM to simulate
Figure PCTCN2022091260-appb-000230
time series distribution.
如图13所示,长短期记忆网络(Long-Short Term Memory,LSTM)是一种递归神经网络,同时接受外部输入
Figure PCTCN2022091260-appb-000231
和反馈输入(C n-1和h n-1)。LSTM的输出包括两项(C n和h n),在下一个时间片这两项被输入到LSTM本身进行处理。在这两个输出项中,C n通过以下操作得到:
As shown in Figure 13, the Long-Short Term Memory (LSTM) is a recurrent neural network that accepts external input
Figure PCTCN2022091260-appb-000231
and feedback inputs (C n-1 and h n-1 ). The output of LSTM includes two items (C n and hn ), which are input to LSTM itself for processing in the next time slice. Of these two output terms, Cn is obtained by:
Figure PCTCN2022091260-appb-000232
Figure PCTCN2022091260-appb-000232
Figure PCTCN2022091260-appb-000233
Figure PCTCN2022091260-appb-000233
Figure PCTCN2022091260-appb-000234
Figure PCTCN2022091260-appb-000234
Figure PCTCN2022091260-appb-000235
Figure PCTCN2022091260-appb-000235
其中,f n、i n
Figure PCTCN2022091260-appb-000236
表示神经网络的输出值,σ与tanh分别表示sigmoid和双曲正切激活函数,W f、W i和W C表示对应神经网络层的网络权重,b f、b i和b C表示对应神经网络的偏移向量,这两部分为神经网络需要学习的参数。
Among them, f n , i n and
Figure PCTCN2022091260-appb-000236
Represents the output value of the neural network, σ and tanh represent the sigmoid and hyperbolic tangent activation functions respectively, W f , W i and W C represent the network weights of the corresponding neural network layer, b f , b i and b C represent the corresponding neural network Offset vector, these two parts are the parameters that the neural network needs to learn.
基于C n,h n由下式计算: Based on C n , h n is calculated by the following formula:
h n=o n*tanh(C n); h n = o n *tanh(C n );
Figure PCTCN2022091260-appb-000237
Figure PCTCN2022091260-appb-000237
其中,W O与b o为神经网络需要学习的参数。 Among them, W O and b o are the parameters that the neural network needs to learn.
基于上述公式,本申请提出了基于LSTM的移动设备位置观测模型对移动设备位置进行预测,其时间序列的展开如图14所示。在每个时间片,当前的移动设备的位置被输入到LSTM网络,LSTM输出h n。为了预测下一个时间片的移动设备位置,在输出中还增加了一个全连接层来对h n进行微调,具体如下: Based on the above formula, this application proposes an LSTM-based mobile device location observation model to predict the location of the mobile device, and its time series expansion is shown in FIG. 14 . In each time slice, the current location of the mobile device is input to the LSTM network, and the LSTM outputs h n . In order to predict the location of the mobile device in the next time slice, a fully connected layer is also added to the output to fine-tune h n as follows:
Figure PCTCN2022091260-appb-000238
Figure PCTCN2022091260-appb-000238
其中,relu为relu激活函数,
Figure PCTCN2022091260-appb-000239
Figure PCTCN2022091260-appb-000240
为神经网络需要训练学习的变量。
Among them, relu is the relu activation function,
Figure PCTCN2022091260-appb-000239
and
Figure PCTCN2022091260-appb-000240
The variables that need to be learned for the training of the neural network.
基于移动设备下一时间片的位置预测
Figure PCTCN2022091260-appb-000241
需要将移动设备聚类分为U组,保证无人机能以负载均衡的方式为其提供服务。为了完成移动设备的聚类,可以采用FCM的方法从模糊理论出发,对每个集群
Figure PCTCN2022091260-appb-000242
移动设备MD m在时间片TS n+1赋予一个度量值d m,u,其计算方式如下:
Location Prediction Based on Mobile Device's Next Time Slice
Figure PCTCN2022091260-appb-000241
Mobile devices need to be clustered into U groups to ensure that UAVs can provide services for them in a load-balanced manner. In order to complete the clustering of mobile devices, the FCM method can be used to start from the fuzzy theory, for each cluster
Figure PCTCN2022091260-appb-000242
The mobile device MD m assigns a metric value d m,u in the time slice TS n+1 , and its calculation method is as follows:
Figure PCTCN2022091260-appb-000243
Figure PCTCN2022091260-appb-000243
其中,c u表示第n个时间片无人机的位置,c k表示第k个集群的中心点,即 Among them, c u represents the position of the UAV in the nth time slice, c k represents the center point of the kth cluster, namely
Figure PCTCN2022091260-appb-000244
Figure PCTCN2022091260-appb-000244
通过最小化待优化的目标函数O,迭代求解d m,u与c k的值,直到连续计算的两个度量值之差小于指定阈值ε cBy minimizing the objective function O to be optimized, iteratively solve the values of d m, u and c k until the difference between the two continuously calculated metric values is less than the specified threshold ε c :
Figure PCTCN2022091260-appb-000245
Figure PCTCN2022091260-appb-000245
在进行迭代之前,所有的c u应该被初始化,每个c u使用
Figure PCTCN2022091260-appb-000246
的值初始化,因为移动设备只能在小范围内移动,它们的新中心点可能接近以前的中心点(这些中心点被规划为无人机运动的位置
Figure PCTCN2022091260-appb-000247
)。
Before iterating, all c u should be initialized, each c u using
Figure PCTCN2022091260-appb-000246
, because mobile devices can only move within a small range, their new center points may be close to the previous center points (these center points are planned as the position of the drone movement
Figure PCTCN2022091260-appb-000247
).
在迭代结束后,每个移动设备MD m被赋予一个度量值d m,u,代表它在第u个集群中的成员资格,可以通过探索策略进一步将d m,u调整为二元聚类决策,这可以降低陷入优化目标O陷入局部最小值的可能性。使用ε c表示探索阈值,移动设备MD m以1-ε c的概率被聚类到具有最大度量值的集群,并以ε c的概率聚类到其他集群。图15的算法详细描述了基于FCM的移动设备在第n个时间片的聚类过程,算法1的输出c u引导无人机运动到
Figure PCTCN2022091260-appb-000248
At the end of the iteration, each mobile device MD m is assigned a metric d m,u representing its membership in the u-th cluster, d m,u can be further adjusted to a binary clustering decision by an exploration strategy , which can reduce the possibility of getting stuck in a local minimum of the optimization objective O. Using εc to denote the exploration threshold, mobile devices MD m are clustered with probability 1− εc to the cluster with the largest metric value, and to other clusters with probability εc . The algorithm in Figure 15 describes in detail the clustering process of mobile devices based on FCM in the nth time slice. The output c u of Algorithm 1 guides the UAV to move to
Figure PCTCN2022091260-appb-000248
需要说明的是,可以使用基于强化学习的深度确定性策略梯度算法求出每个移动设备的任务调度决策变量,计算任务的联合调度包括基于DDPG的任务调度决策变量优化和调度变量的集成两部分。已知无人机的运动轨迹后,算法框架使用DDPG的强化学习算法学习计算任务的调度策略,即:It should be noted that the task scheduling decision variables of each mobile device can be obtained by using the deep deterministic policy gradient algorithm based on reinforcement learning. The joint scheduling of computing tasks includes two parts: DDPG-based task scheduling decision variable optimization and scheduling variable integration. . After the trajectory of the UAV is known, the algorithm framework uses the reinforcement learning algorithm of DDPG to learn the scheduling strategy of computing tasks, namely:
Figure PCTCN2022091260-appb-000249
Figure PCTCN2022091260-appb-000249
策略π是一个从环境状态到决策动作的映射函数,网络环境的状态为:Policy π is a mapping function from environment state to decision-making action, and the state of the network environment is:
Figure PCTCN2022091260-appb-000250
Figure PCTCN2022091260-appb-000250
策略π输出的决策动作为:The decision-making action output by strategy π is:
Figure PCTCN2022091260-appb-000251
Figure PCTCN2022091260-appb-000251
a n的每一个分量都是0到1的连续变量,其大小为:
Figure PCTCN2022091260-appb-000252
Each component of a n is a continuous variable from 0 to 1, whose magnitude is:
Figure PCTCN2022091260-appb-000252
通过强化学习,可以通过最大化总效用值(也被称为Q值)来获得策略π的近似最佳解决方案:With reinforcement learning, an approximately optimal solution to policy π can be obtained by maximizing the total utility value (also known as the Q value):
Figure PCTCN2022091260-appb-000253
Figure PCTCN2022091260-appb-000253
其中,s n+1是在s n状态下采取决策动作a n后环境的新状态,r n(s n,a n)=(ε n) -1是时间片TS n的即时奖励,γ是未来奖励的折扣系数。由于环境的状态和行动空间是高维的,因此采用了两个神经网络:演员神经网络(Actor)π(参数为ω)和评价家神经网络(Critic)Q(参数为θ),如图16所示。为了使学习过程更加稳定,可以采用目标网络(目标策略网络体和目标评价网络分别以
Figure PCTCN2022091260-appb-000254
Figure PCTCN2022091260-appb-000255
为参数)对参数定期进行更新。
Among them, s n +1 is the new state of the environment after decision-making action a n is taken in state s n, r n (s n , a n )=(ε n ) -1 is the immediate reward of time slice TS n , γ is Discount factor for future rewards. Since the state and action space of the environment are high-dimensional, two neural networks are used: actor neural network (Actor) π (parameter is ω) and critic neural network (Critic) Q (parameter is θ), as shown in Figure 16 shown. In order to make the learning process more stable, the target network can be used (the target policy network body and the target evaluation network are respectively represented by
Figure PCTCN2022091260-appb-000254
and
Figure PCTCN2022091260-appb-000255
as parameters) to update the parameters periodically.
在时间片TS n,环境在接受了算法模型输出的动作a n之后,从状态s n过渡到状态s n+1,并产生了一个奖励r n,将这四个项打包成一个元组(s n,a n,s n+1,r n)并存储在一个经验回放池中。在算法训练过程中,从经验回放池中随机选 择一个批次的样本,根据以下损失函数来训练评价神经网络(即参数θ)。 In time slice TS n , the environment transitions from state s n to state s n+1 after accepting the action a n output by the algorithm model, and generates a reward r n , packing these four items into a tuple ( s n , a n , s n+1 , r n ) and stored in an experience playback pool. During the algorithm training process, a batch of samples is randomly selected from the experience playback pool, and the evaluation neural network (ie, parameter θ) is trained according to the following loss function.
Figure PCTCN2022091260-appb-000256
Figure PCTCN2022091260-appb-000256
演员网络最小化以下梯度函数进行参数的训练:The actor network minimizes the following gradient function for parameter training:
Figure PCTCN2022091260-appb-000257
Figure PCTCN2022091260-appb-000257
其中,
Figure PCTCN2022091260-appb-000258
是在当前策略π下从状态分布中采样得到的状态,
Figure PCTCN2022091260-appb-000259
是网络反向传播训练时的一批样本的数量,基于DDPG的任务联合调度训练算法详见图17。
in,
Figure PCTCN2022091260-appb-000258
is the state sampled from the state distribution under the current policy π,
Figure PCTCN2022091260-appb-000259
is the number of samples in a batch during network backpropagation training. The DDPG-based task joint scheduling training algorithm is shown in Figure 17 for details.
由于演员网络的输出的决策动作a n是一个一维向量,而a n的每一项都是0到1范围内的连续值,所以需要将a n以二维的方式进行重新塑型(reshape),并整合为0或1,以便进一步进行任务调度。如图18所示,为a n的塑型与整合算法,算法的时间复杂度为
Figure PCTCN2022091260-appb-000260
经过上述任务调度变量的塑型和整合后,算法3的输出a[m][k]传递给RA模块进行资源优化分配。
Since the decision-making action a n output by the actor network is a one-dimensional vector, and each item of a n is a continuous value ranging from 0 to 1, it is necessary to reshape a n in a two-dimensional manner (reshape ), and integrated into 0 or 1 for further task scheduling. As shown in Figure 18, it is the shaping and integration algorithm of a n , and the time complexity of the algorithm is
Figure PCTCN2022091260-appb-000260
After the shaping and integration of the above task scheduling variables, the output a[m][k] of Algorithm 3 is passed to the RA module for resource optimization allocation.
需要说明的是,可以使用基于凸优化的方法确定网络系统计算与传输资源的分配情况,
Figure PCTCN2022091260-appb-000261
Figure PCTCN2022091260-appb-000262
作为RA模块的输入进行最后的处理。根据子问题P3,优化变量P与F可以直接通过CVXPY工具使用凸优化的方法进行求解。
It should be noted that a method based on convex optimization can be used to determine the allocation of computing and transmission resources in the network system.
Figure PCTCN2022091260-appb-000261
and
Figure PCTCN2022091260-appb-000262
As input to the RA module for final processing. According to the sub-problem P3, the optimization variables P and F can be directly solved by using the convex optimization method through the CVXPY tool.
对于步骤S420,需要说明的是,得到调度策略的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,步骤S420可以包括以下子步骤:Regarding step S420, it should be noted that the specific manner of obtaining the scheduling policy is not limited, and can be set according to actual application requirements. For example, in an alternative example, step S420 may include the following sub-steps:
将当前位置信息输入无人机轨迹规划模型,计算得到至少一个移动设备的预测位置信息;将待处理任务和预测位置信息输入任务联合调度模型,计算得到至少一个移动设备的任务调度决策变量;将待处理任务和任务调度决策变量输入资源分配模型,计算得到调度策略。Input the current location information into the trajectory planning model of the UAV, and calculate the predicted location information of at least one mobile device; input the pending tasks and predicted location information into the task joint scheduling model, and calculate the task scheduling decision variable of at least one mobile device; The pending tasks and task scheduling decision variables are input into the resource allocation model, and the scheduling strategy is calculated.
其中,将当前位置信息输入无人机轨迹规划模型,计算得到至少一个移动设备的预测位置信息的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,该步骤可以包括以下子步骤:Wherein, the specific method of inputting the current location information into the trajectory planning model of the UAV to calculate the predicted location information of at least one mobile device is not limited, and can be set according to actual application requirements. For example, in an alternative example, this step may include the following sub-steps:
根据当前位置信息进行运动预测处理,得到至少一个移动设备的下一位置信息;对至少一个移动设备的下一位置信息进行聚类处理,得到预测位置信息。Perform motion prediction processing according to the current location information to obtain the next location information of at least one mobile device; perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.
需要说明的是,进行预测处理和聚类处理的步骤可以参照前文训练得到无人机轨迹规划模型的过程。It should be noted that the steps of performing prediction processing and clustering processing can refer to the process of obtaining the UAV trajectory planning model through training above.
将待处理任务和预测位置信息输入任务联合调度模型,计算得到至少一个移动设备的任务调度决策变量的具体方式不受限制,可以根据实际应用需求进行设置。例如,在一种可以替代的示例中,该步骤可以包括以下子步骤:The specific manner of inputting the pending tasks and predicted location information into the task joint scheduling model to calculate the task scheduling decision variables of at least one mobile device is not limited, and can be set according to actual application requirements. For example, in an alternative example, this step may include the following sub-steps:
根据待处理任务和预测位置信息进行任务联合调度训练处理,得到至少一个移动设备的决策动作;对决策动作进行集成处理,得到任务调度决策变量。The task joint scheduling training process is performed according to the pending tasks and the predicted position information, and the decision-making action of at least one mobile device is obtained; the decision-making action is integrated, and the task scheduling decision variable is obtained.
需要说明的是,进行训练处理和集成处理的步骤可以参照前文训练得到计算任务联合调度模型的过程。It should be noted that the steps of performing training processing and integration processing can refer to the above-mentioned process of training and obtaining a joint scheduling model for computing tasks.
通过上述方法,本申请部署了一个由单个基站、多个无人机和大量的移动设备组成的移动边缘计算网络,每个计算任务既可以在移动设备上执行,也可以卸载到无人机上进行计算,或者通过无人机作为中继器进一步卸载到基站进 行更密集的计算。在最小化网络系统能耗的目标下,确定了无人机轨迹、任务关联、计算和传输资源分配的联合优化问题。鉴于问题的高度复杂性,本申请将优化问题分解为三个子问题,大幅降低了整体网络系统的能耗,延长了网络的寿命,同时也降低了通信网络中所有移动设备的计算时延,提高了计算密集型应用的服务质量。Through the above method, this application deploys a mobile edge computing network consisting of a single base station, multiple UAVs and a large number of mobile devices. Each computing task can be executed on the mobile device or offloaded to the UAV. Computing, or further offloading to the base station through the drone as a repeater for more intensive computing. With the goal of minimizing the energy consumption of the network system, the joint optimization problem of UAV trajectory, task association, computing and transmission resource allocation is determined. In view of the high complexity of the problem, this application decomposes the optimization problem into three sub-problems, which greatly reduces the energy consumption of the overall network system, prolongs the life of the network, and also reduces the calculation delay of all mobile devices in the communication network, improving Quality of service for computing-intensive applications.
请参照图19,是本申请实施例提供的一种电子设备100的方框示意图,本实施例中的电子设备100可以为能够进行数据交互、处理的服务器、处理设备、处理平台等。电子设备100包括第一存储器110、第一处理器120及通信模块130。第一存储器110、第一处理器120以及通信模块130各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。Please refer to FIG. 19 , which is a schematic block diagram of an electronic device 100 provided by an embodiment of the present application. The electronic device 100 in this embodiment may be a server capable of data interaction and processing, a processing device, a processing platform, and the like. The electronic device 100 includes a first memory 110 , a first processor 120 and a communication module 130 . The components of the first memory 110 , the first processor 120 and the communication module 130 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.
其中,第一存储器110用于存储程序或者数据。第一存储器110可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。Wherein, the first memory 110 is used to store programs or data. The first memory 110 can be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), can Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
第一处理器120用于读/写第一存储器110中存储的数据或程序,并执行相应地功能。通信模块130用于通过网络建立电子设备100与其它通信终端之间的通信连接,并用于通过网络收发数据。The first processor 120 is used for reading/writing data or programs stored in the first memory 110 and performing corresponding functions. The communication module 130 is used to establish a communication connection between the electronic device 100 and other communication terminals through the network, and is used to send and receive data through the network.
应当理解的是,图19所示的结构仅为电子设备100的结构示意图,电子设备100还可包括比图19中所示更多或者更少的组件,或者具有与图19所示不同的配置。图19中所示的各组件可以采用硬件、软件或其组合实现。It should be understood that the structure shown in FIG. 19 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than those shown in FIG. 19 , or have a configuration different from that shown in FIG. 19 . Each component shown in FIG. 19 may be implemented using hardware, software, or a combination thereof.
结合图20,本申请实施例还提供了一种任务卸载装置400,该任务卸载装置400实现的功能对应上述任务卸载方法执行的步骤。该任务卸载装置400可以理解为上述电子设备100的处理器,也可以理解为独立于上述电子设备100或处理器之外的在电子设备100控制下实现本申请功能的组件。其中,任务卸载装置400可以包括任务获取模块410、卸载策略获取模块420和卸载策略发送模块430。Referring to FIG. 20 , the embodiment of the present application further provides a task offloading device 400 , and the functions implemented by the task offloading device 400 correspond to the steps performed by the above task offloading method. The task offloading apparatus 400 can be understood as a processor of the above-mentioned electronic device 100 , and can also be understood as a component independent of the above-mentioned electronic device 100 or the processor that implements the functions of the present application under the control of the electronic device 100 . Wherein, the task offloading apparatus 400 may include a task acquiring module 410 , an offloading policy acquiring module 420 and an offloading policy sending module 430 .
任务获取模块410可以被配置成用于获取至少一个第一设备210的待处理任务,其中,待处理任务包括目标任务。在本申请实施例中,任务获取模块410可以用于执行图3所示的步骤S310,关于任务获取模块410的相关内容可以参照前文对步骤S310的描述。The task acquisition module 410 may be configured to acquire at least one pending task of the first device 210, wherein the pending task includes a target task. In this embodiment of the application, the task acquisition module 410 may be used to execute step S310 shown in FIG. 3 , and for relevant content of the task acquisition module 410 , please refer to the foregoing description of step S310 .
卸载策略获取模块420可以被配置成用于将待处理任务输入预设的任务卸载模型,得到任务卸载策略,其中,任务卸载模型基于建立的系统模型进行训练得到。在本申请实施例中,卸载策略获取模块420可以用于执行图3所示的步骤S320,关于卸载策略获取模块420的相关内容可以参照前文对步骤S320的描述。The offloading strategy acquisition module 420 may be configured to input the tasks to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model. In this embodiment of the present application, the uninstallation policy acquisition module 420 may be used to execute step S320 shown in FIG. 3 , and for related content of the uninstallation policy acquisition module 420 , please refer to the foregoing description of step S320 .
卸载策略发送模块430可以被配置成用于将任务卸载策略发送至至少一个第一设备210,以使至少一个第一设备210基于任务卸载策略将目标任务卸载至第二设备220,第二设备220对目标任务进行执行处理。在本申请实施例中,卸载策略发送模块430可以用于执行图3所示的步骤S330,关于卸载策略发送模块430的相关内容可以参照前文对步骤S330的描述。The offloading policy sending module 430 may be configured to send the task offloading policy to at least one first device 210, so that the at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 Execute the target task. In this embodiment of the present application, the uninstallation policy sending module 430 may be used to execute step S330 shown in FIG. 3 , and for related content of the uninstallation policy sending module 430 , refer to the foregoing description of step S330 .
结合图21,本申请实施例的另外的一些实施方式还提供了一种调度优化装置500。应该理解,本申请实施例的另外的一些实施方式所述的调度优化装置可以实施为根据本申请的一些实施方式所述的任务卸载装置。此外,可以理解,该调度优化装置500实现的功能对应上述调度优化方法执行的步骤。在根据本申请的另外的一些实施方式中,任务获取模块510可以被配置成用于获取至少一个移动设备的待处理任务和当前位置信息,其中,待处理任务包括第一任务和第二任务。在本申请实施例中,任务获取模块510可以用于执行图10所示的步骤S410,关于任务获取模块510的相关内容可以参照前文对步骤S410的描述。With reference to FIG. 21 , some other implementations of the embodiments of the present application further provide a scheduling optimization device 500 . It should be understood that the scheduling optimization apparatus described in some other implementation manners of the embodiments of the present application may be implemented as the task offloading apparatus described in some implementation manners of the present application. In addition, it can be understood that the functions implemented by the scheduling optimization apparatus 500 correspond to the steps performed by the above scheduling optimization method. In some other implementations according to the present application, the task acquisition module 510 may be configured to acquire pending tasks and current location information of at least one mobile device, wherein the pending tasks include the first task and the second task. In the embodiment of the present application, the task acquisition module 510 may be used to execute step S410 shown in FIG. 10 , and for relevant content of the task acquisition module 510 , please refer to the foregoing description of step S410 .
调度策略获取模块520可以被配置成用于将待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略,其中,调度优化模型基于建立的初始模型进行训练得到。在本申请实施例中,调度策略获取模块520可以用于执行图10所示的步骤S420,关于调度策略获取模块520的相关内容可以参照前文对步骤S420的描述。The scheduling strategy acquisition module 520 may be configured to input the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model. In the embodiment of the present application, the dispatching policy acquisition module 520 may be used to execute step S420 shown in FIG. 10 , and for relevant content of the dispatching policy acquisition module 520 , refer to the foregoing description of step S420 .
调度策略发送模块530可以被配置成用于将调度策略发送至至少一个移动设备,以使至少一个移动设备基于调度策略将第一任务发送至至少一个无人机进行处理,将第二任务通过至少一个无人机转发至至少一个基站进行处理。在本申请实施例中,调度策略发送模块530可以用于执行图10所示的步骤S430,关于调度策略发送模块530的相关内容可以参照前文对步骤S430的描述。The scheduling strategy sending module 530 may be configured to send the scheduling strategy to at least one mobile device, so that the at least one mobile device sends the first task to at least one drone for processing based on the scheduling strategy, and sends the second task to at least one UAV for processing. A drone forwards to at least one base station for processing. In the embodiment of the present application, the dispatching policy sending module 530 may be used to execute step S430 shown in FIG. 10 , and for relevant content of the dispatching policy sending module 530 , refer to the foregoing description of step S430 .
此外,本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述任务卸载方法和/或上述调度优化方法的步骤。In addition, an embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the above-mentioned task offloading method and/or the above-mentioned scheduling optimization method are executed. .
本申请实施例所提供的任务卸载方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,程序代码包括的指令可用于执行上述方法实施例中的任务卸载方法和/或上述调度优化方法的步骤,具体可参见上述方法实施例,在此不再赘述。The computer program product of the task offloading method provided in the embodiment of the present application includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the task offloading method in the above method embodiment and/or the above scheduling optimization For the steps of the method, reference may be made to the foregoing method embodiments for details, and details are not repeated here.
综上所述,本申请的一些实施例提供的任务卸载方法和装置、电子设备及存储介质,通过将待处理任务输入任务卸载模型得到任务卸载策略,将任务卸载策略发送至第一设备,以使第一设备基于任务卸载策略将目标任务卸载至第二设备进行处理,实现了将目标任务卸载到服务器上进行处理,避免了相关技术中任务要么全部在无线用户设备本地执行,要么全部卸载在服务器上远程执行,所导致的任务卸载的效率低的问题。To sum up, the task offloading method and device, electronic device, and storage medium provided by some embodiments of the present application obtain a task offloading strategy by inputting tasks to be processed into a task offloading model, and send the task offloading strategy to the first device to Make the first device offload the target task to the second device for processing based on the task offloading strategy, realize the offloading of the target task to the server for processing, and avoid the tasks in the related art that are either all performed locally on the wireless user equipment, or all offloaded on the server. The problem of low efficiency of task offloading caused by remote execution on the server.
本申请的另一些实施例提供的调度优化方法和装置、电子设备及存储介质,通过将待处理任务和当前位置信息输入预设的调度优化模型得到调度策略,将调度策略发送至至少一个移动设备,以使至少一个移动设备基于调度策略将第一任务发送至至少一个无人机进行处理,将第二任务通过至少一个无人机转发至至少一个基站进行处理,实现了将第一任务调度到无人机上进行处理,将第二任务调度到基站进行处理,避免了相关技术中任务要么全部在移动设备本地执行,要么全部调度到无人机或基站上远程执行,所导致的调度优化的效率低的问题。In the scheduling optimization method and device, electronic equipment, and storage medium provided by other embodiments of the present application, the scheduling strategy is obtained by inputting the pending tasks and current location information into the preset scheduling optimization model, and sending the scheduling strategy to at least one mobile device so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station for processing through at least one UAV, realizing the scheduling of the first task to The processing is carried out on the UAV, and the second task is dispatched to the base station for processing, which avoids the efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. low problem.
以上仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.
工业实用性Industrial Applicability
本申请提供了任务卸载方法和装置、电子设备及存储介质,涉及任务卸载技术领域。任务卸载方法应用于电子设备,电子设备与任务卸载系统通信连接,任务卸载系统包括第二设备和至少一个第一设备,任务卸载方法包括:首先,获取至少一个第一设备的待处理任务;其次,将待处理任务输入预设的任务卸载模型,得到任务卸载策略;然后,将任务卸载策略发送至至少一个第一设备,以使至少一个第一设备基于任务卸载策略将目标任务卸载至第二设备,第二设备对目标任务进行执行处理。通过上述方法,可以提高任务卸载的效率。此外,本申请实施例还提供了一种调度优化方法和装置、电子设备及存储介质。The application provides a task offloading method and device, electronic equipment and a storage medium, and relates to the technical field of task offloading. The task offloading method is applied to an electronic device, and the electronic device is connected in communication with a task offloading system. The task offloading system includes a second device and at least one first device. The task offloading method includes: firstly, obtaining a pending task of at least one first device; secondly , input the task to be processed into the preset task offloading model to obtain the task offloading strategy; then, send the task offloading strategy to at least one first device, so that at least one first device offloads the target task to the second device based on the task offloading strategy device, the second device executes the target task. Through the above method, the efficiency of task offloading can be improved. In addition, the embodiments of the present application also provide a scheduling optimization method and device, electronic equipment, and a storage medium.
此外,可以理解的是,本申请的任务卸载方法、调度优化方法和装置、电子设备及存储介质是可以重现的,并且可以用在多种工业应用中。例如,本申请的任务卸载方法、调度优化方法和装置、电子设备及存储介质可以用于任务卸载和调度优化技术领域。In addition, it can be understood that the task offloading method, scheduling optimization method and device, electronic equipment and storage medium of the present application are reproducible and can be used in various industrial applications. For example, the task offloading method, scheduling optimization method and device, electronic device, and storage medium of the present application may be used in the technical field of task offloading and scheduling optimization.

Claims (20)

  1. 一种任务卸载方法,其特征在于,所述任务卸载方法应用于电子设备,所述电子设备与任务卸载系统通信连接,所述任务卸载系统包括第二设备和至少一个第一设备,所述任务卸载方法包括:A task offloading method, characterized in that the task offloading method is applied to an electronic device, and the electronic device communicates with a task offloading system, the task offloading system includes a second device and at least one first device, and the task Uninstallation methods include:
    获取所述至少一个第一设备的待处理任务,其中,所述待处理任务包括目标任务;Acquiring pending tasks of the at least one first device, wherein the pending tasks include target tasks;
    将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略,其中,所述任务卸载模型基于建立的系统模型进行训练得到;Inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;
    将所述任务卸载策略发送至所述至少一个第一设备,以使所述至少一个第一设备基于所述任务卸载策略将所述目标任务卸载至所述第二设备,所述第二设备对所述目标任务进行执行处理。sending the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the second device based on the task offloading policy, and the second device is responsible for The target task performs execution processing.
  2. 根据权利要求1所述的任务卸载方法,其特征在于,所述任务卸载方法还包括获取任务卸载模型的步骤,该步骤包括:The task offloading method according to claim 1, wherein the task offloading method further comprises the step of obtaining a task offloading model, which step includes:
    根据所述任务卸载系统的成本参数建立系统模型和优化成本函数;Establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system;
    根据所述优化成本函数对所述系统模型进行训练,得到任务卸载模型。The system model is trained according to the optimized cost function to obtain a task offloading model.
  3. 根据权利要求2所述的任务卸载方法,其特征在于,所述根据所述任务卸载系统的成本参数建立系统模型和优化成本函数的步骤,包括:The task offloading method according to claim 2, wherein the steps of establishing a system model and optimizing a cost function according to cost parameters of the task offloading system include:
    根据所述至少一个第一设备和第二设备的成本参数建立系统模型;building a system model based on cost parameters of said at least one first device and second device;
    根据所述系统模型建立优化成本函数。An optimization cost function is established based on the system model.
  4. 根据权利要求3所述的任务卸载方法,其特征在于,所述任务卸载模型包括第一任务卸载模型和第二任务卸载模型,所述根据所述优化成本函数对所述系统模型进行训练,得到任务卸载模型的步骤,包括:The task offloading method according to claim 3, wherein the task offloading model includes a first task offloading model and a second task offloading model, and the system model is trained according to the optimized cost function to obtain The steps of the task offloading model include:
    对所述优化成本函数进行分割处理,得到第一优化成本函数和第二优化成本函数;performing segmentation processing on the optimized cost function to obtain a first optimized cost function and a second optimized cost function;
    根据所述第一优化成本函数对所述系统模型进行训练,得到第一任务卸载模型;training the system model according to the first optimization cost function to obtain a first task offloading model;
    根据所述第二优化成本函数对所述系统模型进行训练,得到第二任务卸载模型。The system model is trained according to the second optimization cost function to obtain a second task offloading model.
  5. 根据权利要求4所述的任务卸载方法,其特征在于,所述任务卸载策略包括第一任务卸载策略和第二任务卸载策略,所述将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略的步骤,包括:The method for task offloading according to claim 4, wherein the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, and inputting the task to be processed into a preset task offloading model obtains Steps for a task offload strategy, including:
    将所述待处理任务输入所述第一任务卸载模型,得到第一任务卸载策略;inputting the pending task into the first task offloading model to obtain a first task offloading strategy;
    将所述待处理任务输入所述第二任务卸载模型,得到第二任务卸载策略。The task to be processed is input into the second task offloading model to obtain a second task offloading policy.
  6. 根据权利要求4或5所述的任务卸载方法,其特征在于,所述根据所述第一优化成本函数对所述系统模型进行训练,得到第一任务卸载模型的步骤,包括:The task offloading method according to claim 4 or 5, wherein the step of training the system model according to the first optimization cost function to obtain a first task offloading model includes:
    基于所述系统模型建立深度强化学习模型;Establishing a deep reinforcement learning model based on the system model;
    根据所述第一优化成本函数对所述深度强化学习模型进行训练,得到第一任务卸载模型。The deep reinforcement learning model is trained according to the first optimized cost function to obtain a first task offloading model.
  7. 根据权利要求4至6中任一项所述的任务卸载方法,其特征在于,所述根据所述第二优化成本函数对所述系统模型进行训练,得到第二任务卸载模型的步骤,包括:The task offloading method according to any one of claims 4 to 6, wherein the step of training the system model according to the second optimized cost function to obtain a second task offloading model includes:
    基于所述系统模型建立交替方向乘子法模型;Establishing an alternating direction multiplier method model based on the system model;
    根据所述第二优化成本函数对所述交替方向乘子法模型进行训练,得到第二任务卸载模型。The alternating direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
  8. 一种任务卸载装置,其特征在于,所述任务卸载装置应用于电子设备,所述电子设备与任务卸载系统通信连接, 所述任务卸载系统包括第二设备和至少一个第一设备,所述任务卸载装置包括:A task offloading device, characterized in that the task offloading device is applied to an electronic device, and the electronic device is connected to a task offloading system in communication, the task offloading system includes a second device and at least one first device, and the task Unloading devices include:
    任务获取模块,被配置成用于获取所述至少一个第一设备的待处理任务,其中,所述待处理任务包括目标任务;A task acquisition module configured to acquire pending tasks of the at least one first device, wherein the pending tasks include target tasks;
    卸载策略获取模块,被配置成用于将所述待处理任务输入预设的任务卸载模型,得到任务卸载策略,其中,所述任务卸载模型基于建立的系统模型进行训练得到;An offloading strategy acquisition module configured to input the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;
    卸载策略发送模块,被配置成用于将所述任务卸载策略发送至所述至少一个第一设备,以使所述至少一个第一设备基于所述任务卸载策略将所述目标任务卸载至所述第二设备,所述第二设备对所述目标任务进行执行处理。an offloading policy sending module, configured to send the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the A second device, where the second device executes the target task.
  9. 一种调度优化方法,,其特征在于,所述调度优化方法应用于电子设备,该电子设备与移动边缘计算网络系统通信连接,所述移动边缘计算网络系统包括至少一个基站、无人机和移动设备,所述调度优化方法包括:A scheduling optimization method, characterized in that the scheduling optimization method is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system, and the mobile edge computing network system includes at least one base station, unmanned aerial vehicles, and mobile equipment, the scheduling optimization method includes:
    获取所述至少一个移动设备的待处理任务和当前位置信息,其中,所述待处理任务包括第一任务和第二任务;Obtain pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;
    将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略,其中,所述调度优化模型基于建立的初始模型进行训练得到;Inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;
    将所述调度策略发送至所述至少一个移动设备,以使所述至少一个移动设备基于所述调度策略将所述第一任务发送至所述至少一个无人机进行处理,将所述第二任务通过所述至少一个无人机转发至所述至少一个基站进行处理。sending the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one drone for processing based on the scheduling strategy, and the second The task is forwarded by the at least one drone to the at least one base station for processing.
  10. 根据权利要求9所述的调度优化方法,其特征在于,所述调度优化方法采用根据权利要求1至7中任一项所述的任务卸载方法来实现。The scheduling optimization method according to claim 9, characterized in that the scheduling optimization method is implemented by using the task offloading method according to any one of claims 1-7.
  11. 根据权利要求9或10所述的调度优化方法,其特征在于,所述调度优化方法还包括获取调度优化模型的步骤,该步骤包括:The scheduling optimization method according to claim 9 or 10, wherein the scheduling optimization method also includes the step of obtaining a scheduling optimization model, which step includes:
    根据所述移动边缘计算网络系统的初始参数建立初始模型和优化目标函数;Establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system;
    根据所述优化目标函数对所述初始模型进行训练,得到调度优化模型。The initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
  12. 根据权利要求11所述的调度优化方法,其特征在于,所述根据所述移动边缘计算网络系统的初始参数建立初始模型和优化目标函数的步骤,包括:The scheduling optimization method according to claim 11, wherein the step of establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system includes:
    根据所述至少一个基站、无人机和移动设备的初始参数建立初始模型;establishing an initial model based on initial parameters of the at least one base station, UAV, and mobile device;
    根据所述初始模型建立优化目标函数。An optimization objective function is established according to the initial model.
  13. 根据权利要求11或12所述的调度优化方法,其特征在于,所述调度优化模型包括无人机轨迹规划模型、计算任务联合调度模型和资源分配模型,所述根据所述优化目标函数对所述初始模型进行训练,得到调度优化模型的步骤,包括:The scheduling optimization method according to claim 11 or 12, wherein the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model and a resource allocation model, and the optimization objective function is used for all The above initial model is trained to obtain the steps of scheduling optimization model, including:
    对所述优化目标函数进行拆分处理,得到第一优化目标函数、第二优化目标函数和第三优化目标函数;performing split processing on the optimization objective function to obtain a first optimization objective function, a second optimization objective function and a third optimization objective function;
    根据所述第一优化目标函数对所述初始模型进行训练,得到所述无人机轨迹规划模型,根据所述第二优化目标函数对所述初始模型进行训练,得到所述计算任务联合调度模型,根据所述第三优化目标函数对所述初始模型进行训练,得到所述资源分配模型。The initial model is trained according to the first optimization objective function to obtain the UAV trajectory planning model, and the initial model is trained according to the second optimization objective function to obtain the computing task joint scheduling model , training the initial model according to the third optimization objective function to obtain the resource allocation model.
  14. 根据权利要求13所述的调度优化方法,其特征在于,所述将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略的步骤,包括:The scheduling optimization method according to claim 13, wherein the step of inputting the pending tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy includes:
    将所述当前位置信息输入所述无人机轨迹规划模型,计算得到所述至少一个移动设备的预测位置信息;inputting the current location information into the UAV trajectory planning model, and calculating predicted location information of the at least one mobile device;
    将所述待处理任务和预测位置信息输入所述任务联合调度模型,计算得到所述至少一个移动设备的任务调度 决策变量;Input the task to be processed and the predicted location information into the task joint scheduling model, and calculate the task scheduling decision variable of the at least one mobile device;
    将所述待处理任务和任务调度决策变量输入所述资源分配模型,计算得到调度策略。Input the pending tasks and task scheduling decision variables into the resource allocation model to calculate a scheduling strategy.
  15. 根据权利要求14所述的调度优化方法,其特征在于,所述将所述当前位置信息输入所述无人机轨迹规划模型,计算得到所述至少一个移动设备的预测位置信息的步骤,包括:The scheduling optimization method according to claim 14, wherein the step of inputting the current location information into the UAV trajectory planning model and calculating the predicted location information of the at least one mobile device includes:
    根据所述当前位置信息进行运动预测处理,得到所述至少一个移动设备的下一位置信息;performing motion prediction processing according to the current location information to obtain the next location information of the at least one mobile device;
    对所述至少一个移动设备的下一位置信息进行聚类处理,得到预测位置信息。Perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.
  16. 根据权利要求14或15所述的调度优化方法,其特征在于,所述将所述待处理任务和预测位置信息输入所述任务联合调度模型,计算得到所述至少一个移动设备的任务调度决策变量的步骤,包括:The scheduling optimization method according to claim 14 or 15, characterized in that the task scheduling decision variable of the at least one mobile device is calculated by inputting the pending task and predicted location information into the task joint scheduling model steps, including:
    根据所述待处理任务和预测位置信息进行任务联合调度训练处理,得到所述至少一个移动设备的决策动作;performing task joint scheduling training processing according to the pending task and predicted location information, to obtain the decision-making action of the at least one mobile device;
    对所述决策动作进行集成处理,得到任务调度决策变量。The decision-making actions are integrated to obtain task scheduling decision variables.
  17. 一种调度优化装置,其特征在于,应用于电子设备,该电子设备与移动边缘计算网络系统通信连接,所述移动边缘计算网络系统包括至少一个基站、无人机和移动设备,所述调度优化装置包括:A scheduling optimization device, characterized in that it is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system. The mobile edge computing network system includes at least one base station, unmanned aerial vehicle and mobile equipment. The scheduling optimization Devices include:
    任务获取模块,被配置成用于:获取所述至少一个移动设备的待处理任务和当前位置信息,其中,所述待处理任务包括第一任务和第二任务;A task acquisition module configured to: acquire pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;
    调度策略获取模块,被配置成用于:将所述待处理任务和当前位置信息输入预设的调度优化模型,得到调度策略,其中,所述调度优化模型基于建立的初始模型进行训练得到;The scheduling strategy acquisition module is configured to: input the task to be processed and the current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;
    调度策略发送模块,被配置成用于:将所述调度策略发送至所述至少一个移动设备,以使所述至少一个移动设备基于所述调度策略将所述第一任务发送至所述至少一个无人机进行处理,将所述第二任务通过所述至少一个无人机转发至所述至少一个基站进行处理。A scheduling strategy sending module, configured to: send the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one mobile device based on the scheduling strategy The UAV performs processing, and forwards the second task to the at least one base station through the at least one UAV for processing.
  18. 根据权利要求17所述的一种调度优化装置,其特征在于,所述调度优化装置实施为根据权利要求8所述的任务卸载装置。The scheduling optimization device according to claim 17, wherein the scheduling optimization device is implemented as the task offloading device according to claim 8.
  19. 一种电子设备,其特征在于,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至7任一项所述的任务卸载方法以及根据权利要求9至16中任一项所述的调度优化方法。An electronic device, characterized by comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the computer program described in any one of claims 1 to 7 is realized. The task offloading method described above and the scheduling optimization method according to any one of claims 9 to 16.
  20. 一种存储介质,其特征在于,所述存储介质包括计算机程序,所述计算机程序运行时控制所述存储介质所在电子设备执行权利要求1至7任一项所述的任务卸载方法以及根据权利要求9至16中任一项所述的调度优化方法。A storage medium, characterized in that the storage medium includes a computer program, and when the computer program runs, the electronic device where the storage medium is located is controlled to execute the task offloading method described in any one of claims 1 to 7 and according to the claims The scheduling optimization method described in any one of 9 to 16.
PCT/CN2022/091260 2021-05-18 2022-05-06 Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium WO2022242468A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110537588.4A CN112988285B (en) 2021-05-18 2021-05-18 Task unloading method and device, electronic equipment and storage medium
CN202110537588.4 2021-05-18
CN202110765005.3A CN113254188B (en) 2021-07-07 2021-07-07 Scheduling optimization method and device, electronic equipment and storage medium
CN202110765005.3 2021-07-07

Publications (1)

Publication Number Publication Date
WO2022242468A1 true WO2022242468A1 (en) 2022-11-24

Family

ID=84140273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091260 WO2022242468A1 (en) 2021-05-18 2022-05-06 Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium

Country Status (1)

Country Link
WO (1) WO2022242468A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116647880A (en) * 2023-07-26 2023-08-25 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180109590A1 (en) * 2016-10-18 2018-04-19 Huawei Technologies Co., Ltd. Virtual Network State Management in Mobile Edge Computing
CN110543336A (en) * 2019-08-30 2019-12-06 北京邮电大学 Edge calculation task unloading method and device based on non-orthogonal multiple access technology
CN111093255A (en) * 2019-12-26 2020-05-01 苏州电海智能科技有限公司 Electric power pack energy supply base station cooperation method based on UAV edge processing
US20200220905A1 (en) * 2019-01-03 2020-07-09 Samsung Electronics Co., Ltd. Electronic device providing ip multimedia subsystem (ims) service in network environment supporting mobile edge computing (mec)
CN111580889A (en) * 2020-05-13 2020-08-25 长沙理工大学 Method, device and equipment for unloading tasks of edge server and storage medium
CN111600648A (en) * 2020-05-25 2020-08-28 中国矿业大学 Mobile relay position control method of mobile edge computing system
CN112381265A (en) * 2020-10-19 2021-02-19 长沙理工大学 Unmanned aerial vehicle-based charging and task unloading system and task time consumption optimization method thereof
CN112422644A (en) * 2020-11-02 2021-02-26 北京邮电大学 Method and system for unloading computing tasks, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180109590A1 (en) * 2016-10-18 2018-04-19 Huawei Technologies Co., Ltd. Virtual Network State Management in Mobile Edge Computing
US20200220905A1 (en) * 2019-01-03 2020-07-09 Samsung Electronics Co., Ltd. Electronic device providing ip multimedia subsystem (ims) service in network environment supporting mobile edge computing (mec)
CN110543336A (en) * 2019-08-30 2019-12-06 北京邮电大学 Edge calculation task unloading method and device based on non-orthogonal multiple access technology
CN111093255A (en) * 2019-12-26 2020-05-01 苏州电海智能科技有限公司 Electric power pack energy supply base station cooperation method based on UAV edge processing
CN111580889A (en) * 2020-05-13 2020-08-25 长沙理工大学 Method, device and equipment for unloading tasks of edge server and storage medium
CN111600648A (en) * 2020-05-25 2020-08-28 中国矿业大学 Mobile relay position control method of mobile edge computing system
CN112381265A (en) * 2020-10-19 2021-02-19 长沙理工大学 Unmanned aerial vehicle-based charging and task unloading system and task time consumption optimization method thereof
CN112422644A (en) * 2020-11-02 2021-02-26 北京邮电大学 Method and system for unloading computing tasks, electronic device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116017472B (en) * 2022-12-07 2024-04-19 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116647880A (en) * 2023-07-26 2023-08-25 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service
CN116647880B (en) * 2023-07-26 2023-10-13 国网冀北电力有限公司 Base station cooperation edge computing and unloading method and device for differentiated power service

Similar Documents

Publication Publication Date Title
Liu et al. Path planning for UAV-mounted mobile edge computing with deep reinforcement learning
Zhou et al. Deep reinforcement learning for delay-oriented IoT task scheduling in SAGIN
Do et al. Deep reinforcement learning for energy-efficient federated learning in UAV-enabled wireless powered networks
WO2022242468A1 (en) Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium
CN110730031B (en) Unmanned aerial vehicle track and resource allocation joint optimization method for multi-carrier communication
CN113254188B (en) Scheduling optimization method and device, electronic equipment and storage medium
WO2022199032A1 (en) Model construction method, task allocation method, apparatus, device, and medium
CN113543176A (en) Unloading decision method of mobile edge computing system based on assistance of intelligent reflecting surface
Chen et al. Resource awareness in unmanned aerial vehicle-assisted mobile-edge computing systems
Fan et al. Ris-assisted uav for fresh data collection in 3d urban environments: A deep reinforcement learning approach
Li et al. Deep-graph-based reinforcement learning for joint cruise control and task offloading for aerial edge internet of things (edgeiot)
CN113905347B (en) Cloud edge end cooperation method for air-ground integrated power Internet of things
Liu et al. Energy-efficient space–air–ground integrated edge computing for internet of remote things: A federated DRL approach
CN114884949B (en) Task unloading method for low-orbit satellite Internet of things based on MADDPG algorithm
CN112988285B (en) Task unloading method and device, electronic equipment and storage medium
Nguyen et al. DRL-based intelligent resource allocation for diverse QoS in 5G and toward 6G vehicular networks: a comprehensive survey
Dai et al. Mobile crowdsensing for data freshness: A deep reinforcement learning approach
Zhu et al. Fairness-aware task loss rate minimization for multi-UAV enabled mobile edge computing
Wang et al. Digital twin-enabled computation offloading in UAV-assisted MEC emergency networks
CN115766478A (en) Unloading method of air-ground cooperative edge computing server
CN115580900A (en) Unmanned aerial vehicle assisted cooperative task unloading method based on deep reinforcement learning
CN115002123A (en) Fast adaptive task unloading system and method based on mobile edge calculation
Liao et al. Learning-based queue-aware task offloading and resource allocation for air-ground integrated PIoT
Wang et al. Trajectory planning of UAV-enabled data uploading for large-scale dynamic networks: A trend prediction based learning approach
Dong et al. Deep Progressive Reinforcement Learning-Based Flexible Resource Scheduling Framework for IRS and UAV-Assisted MEC System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22803792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE