WO2022242468A1

WO2022242468A1 - Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022242468A1
Application number: PCT/CN2022/091260
Authority: WO
Inventors: 任涛; 胡哲源; 谷宁波; 牛建伟; 胡舒程; 李青锋
Original assignee: 北京航空航天大学杭州创新研究院
Priority date: 2021-05-18
Filing date: 2022-05-06
Publication date: 2022-11-24

Abstract

A task offloading method and apparatus, an electronic device, and a storage medium. The task offloading method comprises: S310, acquiring a task to be processed of at least one first device; S320, inputting said task into a preset task offloading model, so as to obtain a task offloading strategy; and S330, sending the task offloading strategy to the at least one first device, such that the at least one first device offloads a target task to a second device on the basis of the task offloading strategy, and the second device then processes the target task.

Description

Task offloading method, scheduling optimization method and device, electronic device and storage medium

Cross References to Related Applications

This application claims the priority of the Chinese patent application with application number 202110537588.4 titled "task offloading method and device, electronic equipment and storage medium" submitted to the State Intellectual Property Office of China on May 18, 2021, and filed on July 2021 The priority of the Chinese patent application with application number 202110765005.3 entitled "Scheduling optimization method and device, electronic equipment and storage medium" filed with the State Intellectual Property Office of the People's Republic of China on March 07, the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of task offloading and scheduling optimization, and in particular, to a task offloading method, a scheduling optimization method and device, electronic equipment, and a storage medium.

Background technique

One of the key issues to be solved in the mobile edge computing network is the computing offload problem, that is, the wireless user equipment chooses to offload computing tasks to a nearby server or perform locally, and how to allocate resources for the tasks offloaded to the server (such as computing resources and energy resources).

However, the inventors have found through research that in related technologies, all tasks are either executed locally on the wireless user equipment, or all tasks are offloaded and executed remotely on the server, thus there is a problem of low efficiency of task offloading.

In addition, when the network infrastructure is unavailable (such as a natural disaster rescue scene), network equipment is sparsely distributed (such as a field operation environment), or when faced with a temporary surge of mobile devices that far exceeds the network service capacity (such as a large game or assembly ), UAVs can be used as communication relay stations or edge computing platforms. In the field of UAV-assisted mobile edge computing, it is necessary to properly determine the scheduling of UAV computing tasks in the mobile edge computing network (whether the computing task is executed locally on the mobile device or dispatched to the UAV or base station). decision to obtain desired performance.

However, the inventors have found through research that in related technologies, tasks are either all executed locally on mobile devices, or all tasks are dispatched to drones or base stations for remote execution, so there is a problem of low scheduling optimization efficiency.

Contents of the invention

In view of this, one aspect of the present application provides a task offloading method and device, electronic equipment, and a storage medium, so as to improve the problems existing in related technologies.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

One aspect of the present application provides a task offloading method, the task offloading method is applied to an electronic device, the electronic device is communicatively connected to a task offloading system, and the task offloading system includes a second device and at least one first device, The task offloading method may include:

Acquiring pending tasks of the at least one first device, wherein the pending tasks include target tasks;

Inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;

sending the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the second device based on the task offloading policy, and the second device is responsible for The target task performs execution processing.

In an optional implementation manner, the task offloading method may also include the step of obtaining a task offloading model, which may include:

Establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system;

The system model is trained according to the optimized cost function to obtain a task offloading model.

In an optional embodiment, the step of establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system may include:

building a system model based on cost parameters of said at least one first device and second device;

An optimization cost function is established based on the system model.

In an optional embodiment, the task offloading model includes a first task offloading model and a second task offloading model, and the step of training the system model according to the optimization cost function to obtain the task offloading model may be include:

performing segmentation processing on the optimized cost function to obtain a first optimized cost function and a second optimized cost function;

Train the system model according to the first optimization cost function to obtain a first task offloading model;

The system model is trained according to the second optimization cost function to obtain a second task offloading model.

In an optional embodiment, the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, and the step of inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy may be include:

inputting the pending task into the first task offloading model to obtain a first task offloading strategy;

The task to be processed is input into the second task offloading model to obtain a second task offloading policy.

In an optional implementation manner, the step of training the system model according to the first optimization cost function to obtain a first task offloading model may include:

Establishing a deep reinforcement learning model based on the system model;

The deep reinforcement learning model is trained according to the first optimized cost function to obtain a first task offloading model.

In an optional implementation manner, the step of training the system model according to the second optimization cost function to obtain a second task offloading model may include:

Establishing an alternating direction multiplier method model based on the system model;

The alternating direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.

The present application also provides a task offloading device, the task offloading device is applied to electronic equipment, and the electronic equipment is connected in communication with a task offloading system, the task offloading system includes a second device and at least one first device, the task Unloading devices can include:

A task acquisition module configured to acquire pending tasks of the at least one first device, wherein the pending tasks include target tasks;

A strategy acquisition module configured to input the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;

A policy sending module configured to send the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the first device based on the task offloading policy. A second device, where the second device executes the target task.

The present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the tasks described in any one of the preceding embodiments are realized. Uninstall method.

The present application provides a storage medium, the storage medium includes a computer program, and when the computer program runs, the computer program controls the electronic device where the storage medium is located to execute the task offloading method described in any one of the foregoing implementation manners.

In the task offloading method and device, electronic device, and storage medium provided in the embodiments of the present application, the task offloading strategy is obtained by inputting the task to be processed into the task offloading model, and the task offloading strategy is sent to the first device, so that the first device can offload the task based on the task. The strategy offloads the target task to the second device for processing, realizes the offloading of the target task to the server for processing, and avoids the problems in related technologies that either all tasks are performed locally on the wireless user equipment, or all tasks are offloaded and performed remotely on the server. The problem of low efficiency of task offloading.

Another aspect of the present application also provides a method and device for scheduling optimization, electronic equipment, and storage media, so as to improve the problems existing in related technologies.

Another aspect of the present application also provides a scheduling optimization method. The scheduling optimization method is applied to electronic equipment, and the electronic equipment is connected to a mobile edge computing network system in communication. The mobile edge computing network system includes at least one base station, wireless For man-machine and mobile devices, the scheduling optimization method may include:

Obtain pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;

Inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;

sending the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one drone for processing based on the scheduling strategy, and the second The task is forwarded by the at least one drone to the at least one base station for processing.

In an optional implementation manner, the scheduling optimization method may be implemented by using the task offloading method according to the implementation manners of the present application.

In an optional implementation manner, the scheduling optimization method further includes the step of obtaining a scheduling optimization model, which may include:

Establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system;

The initial model is trained according to the optimization objective function to obtain a scheduling optimization model.

In an optional embodiment, the step of establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system may include:

establishing an initial model based on initial parameters of the at least one base station, UAV, and mobile device;

An optimization objective function is established according to the initial model.

In an optional embodiment, the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and the initial model is trained according to the optimization objective function to obtain scheduling optimization The steps of the model can include:

performing split processing on the optimization objective function to obtain a first optimization objective function, a second optimization objective function and a third optimization objective function;

The initial model is trained according to the first optimization objective function to obtain the UAV trajectory planning model, and the initial model is trained according to the second optimization objective function to obtain the computing task joint scheduling model , training the initial model according to the third optimization objective function to obtain the resource allocation model.

In an optional implementation manner, the step of inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy may include:

inputting the current location information into the UAV trajectory planning model, and calculating predicted location information of the at least one mobile device;

inputting the to-be-processed tasks and predicted location information into the task joint scheduling model, and calculating task scheduling decision variables of the at least one mobile device;

Input the pending tasks and task scheduling decision variables into the resource allocation model to calculate a scheduling strategy.

In an optional embodiment, the step of inputting the current location information into the UAV trajectory planning model and calculating the predicted location information of the at least one mobile device may include:

performing motion prediction processing according to the current location information to obtain the next location information of the at least one mobile device;

Perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.

In an optional implementation manner, the step of inputting the to-be-processed tasks and predicted location information into the task joint scheduling model, and calculating the task scheduling decision variables of the at least one mobile device may include:

performing task joint scheduling training processing according to the pending task and predicted location information, to obtain the decision-making action of the at least one mobile device;

The decision-making actions are integrated to obtain task scheduling decision variables.

The present application provides a scheduling optimization device, which is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system. The mobile edge computing network system includes at least one base station, unmanned aerial vehicles, and mobile equipment. The scheduling optimization Devices include:

The task acquisition module may be configured to: acquire the pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;

The strategy acquisition module may be configured to: input the task to be processed and the current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;

A policy sending module, configured to: send the scheduling policy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one mobile device based on the scheduling policy The UAV performs processing, and forwards the second task to the at least one base station through the at least one UAV for processing.

In an optional implementation manner, the scheduling optimization device is implemented as the task offloading device according to the implementation manners of the present application.

The present application provides an electronic device, which may include: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it implements any of the foregoing embodiments A task offloading method and/or a scheduling optimization method.

The present application provides a storage medium, the storage medium may include a computer program, and when the computer program is running, the electronic device where the storage medium is located is controlled to execute the task offloading method and/or scheduling optimization described in any one of the preceding embodiments method.

In the scheduling optimization method and device, electronic equipment, and storage medium provided in the embodiments of the present application, the scheduling strategy is obtained by inputting the tasks to be processed and the current location information into the preset scheduling optimization model, and the scheduling strategy is sent to at least one mobile device, so that At least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station through at least one UAV for processing, realizing the scheduling of the first task to the UAV For processing, the second task is dispatched to the base station for processing, which avoids the problem of low efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. .

Description of drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, so It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.

Fig. 1 shows a structural block diagram of a data processing system provided by some embodiments of the present application.

FIG. 2 shows a structural block diagram of a task offloading system provided by an embodiment of the present application.

FIG. 3 is a schematic flowchart of a task offloading method provided by an embodiment of the present application.

FIG. 4 is a schematic structural diagram of a task offloading model provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of a deep reinforcement learning model provided by an embodiment of the present application.

FIG. 6 is a schematic flowchart of the ECRA algorithm provided by the embodiment of the present application.

FIG. 7 is another schematic flow chart of the task offloading method provided by the embodiment of the present application.

Fig. 8 shows a structural block diagram of a data processing system provided by other embodiments of the present application.

FIG. 9 shows a structural block diagram of a scheduling optimization system provided by an embodiment of the present application.

FIG. 10 is a schematic flowchart of a scheduling optimization method provided by an embodiment of the present application.

FIG. 11 is another schematic flowchart of the scheduling optimization method provided by the embodiment of the present application.

FIG. 12 is a schematic structural diagram of a scheduling optimization model provided by an embodiment of the present application.

FIG. 13 is a schematic structural diagram of the LSTM network provided by the embodiment of the present application.

FIG. 14 is a schematic structural diagram of an LSTM network-based mobile device location prediction model provided by an embodiment of the present application.

FIG. 15 is a schematic flowchart of an FCM-based mobile device clustering algorithm provided by an embodiment of the present application.

FIG. 16 is a schematic structural diagram of the actor neural network and the evaluator neural network provided by the embodiment of the present application.

FIG. 17 is a schematic flowchart of a DDPG-based computing task scheduling algorithm provided by an embodiment of the present application.

FIG. 18 is a schematic flowchart of a scheduling variable shaping integration algorithm provided by an embodiment of the present application.

FIG. 19 shows a structural block diagram of an electronic device provided by an embodiment of the present application.

FIG. 20 is a structural block diagram of a task offloading device provided by an embodiment of the present application.

FIG. 21 is a structural block diagram of a scheduling optimization device provided by an embodiment of the present application.

Icons: 10-data processing system; 100-electronic equipment; 110-first memory; 120-first processor; 130-communication module; 200-task offloading system; 300-scheduling optimization system; 210-first device; 220 -second device; 400-task unloading device; 410-task acquisition module; 420-unloading strategy acquisition module; 430-unloading strategy sending module; 500-task scheduling device; 510-task acquisition module; 520-scheduling strategy acquisition module; 530 —Scheduling policy sending module.

Detailed ways

With the rapid development of wireless communication technology and the popularity of smart mobile devices, in recent years, the number of various mobile applications has shown an explosive growth trend. Among them, applications such as face recognition payment systems, online cloud games, and virtual/augmented reality (VR/AR) are computing-intensive and delay-critical applications, and mobile devices (such as smartphones, wearable devices) that run these applications Often only with limited computing power and battery power, the contradictory relationship between computing-intensive applications and resource-constrained devices poses a challenge to improve the quality of experience (QoE) for users.

Mobile edge computing (Mobile edge computing, MEC) is a promising technology, which can provide powerful computing power and energy resources for users' mobile devices by setting edge servers in the edge computing network. Type tasks are offloaded to the edge server to reduce task execution delay and save battery energy consumed by local devices. At the same time, with the development of wireless power transfer technology (wireless power transfer, WPT), the battery of wireless user equipment can be continuously charged through wireless transmission, which greatly prolongs the battery power supply time and alleviates the problem of wireless user equipment due to insufficient energy. limits.

One of the key issues to be solved in the mobile edge computing network is the computing offloading problem, that is, the wireless user equipment chooses to offload computing tasks to a nearby MEC server or execute locally, and how to allocate resources for the tasks offloaded to the server ( such as computing resources and energy resources). Generally, a wireless network consists of multiple wireless user equipments, and the dynamic change of time-varying channel conditions caused by the mobility of wireless user equipments complicates the offload scheduling process. A good computing offload strategy can improve the overall computing power of wireless user equipment and enhance the performance of mobile edge computing systems. Therefore, a lot of recent research and inventions have focused on designing efficient computation offloading and resource allocation strategies.

Some existing inventions or researches propose the use of dynamic programming algorithms and branch-and-bound methods to offload computing tasks and allocate resources in mobile edge computing networks. However, these methods require a lot of computational complexity when solving optimization variables. Time is only applicable to scenarios with relatively simple network environments. Although offloading optimization methods based on heuristic algorithms can reduce computational complexity, such methods usually require a large number of computational iterations to achieve satisfactory optimization results, which may not be practically applied to dynamic mobile edge computing networks (i.e., wireless user equipment Time-varying channel conditions caused by mobile movement) for online computation offloading.

In order to improve at least one of the above-mentioned technical problems raised by the present application, embodiments of the present application provide a task offloading method and device, electronic equipment, and a storage medium. The technical solution of the present application will be described below through possible implementation modes.

The defects in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions proposed by the embodiments of the application for the above problems below should be The inventor's contribution to the invention process.

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

It should be noted that the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, but also includes none other elements specifically listed, or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

It should be noted that, in the case of no conflict, the features in the embodiments of the present application may be combined with each other.

Fig. 1 is a structural block diagram of a data processing system 10 provided by some embodiments of the present application, which provides a possible implementation of a data processing system 10, referring to Fig. 1, the data processing system 10 may include an electronic device 100, task offloading One or more of the system 200.

Wherein, the electronic device 100 communicates with the task offloading system 200, and the electronic device 100 obtains the tasks to be processed by the task offloading system 200, and obtains a task offloading strategy according to the pending tasks, so that the task offloading system 200 performs task offloading processing according to the task offloading strategy.

Optionally, the specific composition of the task offloading system 200 is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading system 200 may include a second device 220 and at least one first device 210 .

It should be noted that, in an alternative example, the electronic device 100 and the first device 210 may be the same device; in another alternative example, the electronic device 100 and the second device 220 may be the same device.

Optionally, specific types of the first device 210 and the second device 220 are not limited, and may be set according to actual application requirements. For example, in an alternative example, the first device 210 may be a wireless user device, and the second device 220 may be an edge computing server.

Combined with Figure 2, a large-scale mobile edge computing network includes an edge computing server with a wireless access point (access point, AP) and N wireless user equipment, where N={1,2,..., N}, each wireless user equipment can move within a certain range, and the energy of the wireless access point is stable enough to transmit power to the wireless user equipment through radio frequency. Each wireless user equipment is equipped with a wireless transmission antenna, which can perform data transmission with the wireless access point, and can also receive energy from the wireless access point. The energy received from the wireless access point is stored in the rechargeable battery of the wireless user device.

FIG. 3 shows one of the flow charts of the task offloading method provided by the embodiment of the present application. The method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 . It should be understood that in other embodiments, the order of some steps in the task offloading method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted. The flow of the task offloading method shown in FIG. 3 will be described in detail below.

Step S310, acquiring at least one pending task of the first device 210.

Wherein, the tasks to be processed include target tasks.

Step S320, inputting the tasks to be processed into a preset task offloading model to obtain a task offloading strategy.

Wherein, the task offloading model is trained based on the established system model.

Step S330, sending the task offloading policy to at least one first device 210, so that at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 executes the target task.

The above method obtains the task offloading strategy by inputting the tasks to be processed into the task offloading model, and sends the task offloading strategy to the first device, so that the first device offloads the target task to the second device for processing based on the task offloading strategy. The tasks are offloaded to the server for processing, which avoids the problem of low efficiency of task offloading caused by the related technologies that all tasks are either executed locally on the wireless user equipment, or all offloaded and executed remotely on the server.

Before step S310, the task offloading method provided by the present application may also include the step of obtaining a task offloading model, which may include:

Establish a system model and optimize a cost function according to the cost parameters of the task offloading system 200; train the system model according to the optimized cost function to obtain a task offloading model.

Optionally, the specific ways of establishing the system model and optimizing the cost function according to the cost parameters of the task offloading system 200 are not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:

Establishing a system model according to the cost parameters of at least one first device 210 and the second device 220; establishing an optimization cost function according to the system model.

In detail, first establish a system model, the entire system time can be divided into multiple constant time slices, denoted as t∈{1,2,...}, each time slice is T seconds long, and assuming that when When each wireless user equipment generates a calculation-intensive task in the time slice t, the execution time of these tasks will not exceed a length of time. The computing power of the MEC server where the wireless access point is deployed is much stronger than that of the wireless user equipment. Therefore, each wireless user equipment can choose to perform tasks remotely on the server by offloading calculations, or choose to perform tasks locally.

In each time slice t, since the wireless channel gain between the wireless user equipment and the wireless access point has a great influence on the efficiency of wireless power transmission and mission data transmission, this application uses

Indicates the channel gain of the i-th wireless user equipment in the time slice t at the wireless access point, and the length of the time slice is small enough to ensure the channel gain in the time slice

The size of is unchanged. According to the Rayleigh fading channel model, the wireless channel gain can be expressed as

where ∈ ^t denotes an independent exponential random variable with unit mean,

Expressed by the following formula:

Among them, A _g represents the antenna gain, f _c represents the carrier frequency, l _e represents the road strength fading index,

Indicates the distance between the i-th wireless user equipment and the wireless access point on the two-dimensional plane. It can be seen from the formula that with the distance

The increase of , the smaller the wireless channel gain.

Secondly, the energy acquisition model of wireless user equipment is established. At the beginning of each time slice t, the edge computing server charges each user equipment for q ^t T seconds through wireless power transmission technology, where q ^t ∈ [0,1], i The energy obtained by a wireless user equipment is:

Among them, μ ∈ (0, 1) represents the efficiency of wireless energy harvesting, P _i represents the transmission power between the wireless access connection point and the user equipment, and q ^t represents the time ratio of wireless charging.

This application assumes that the battery energy of each wireless user equipment is limited, and at the end of time slice t (that is, the beginning of time slice t+1), the remaining power of the user equipment is:

Among them, E ^t is the energy consumed in time slice t, H ^t is the energy obtained through wireless power transmission technology in time slice t,

It is the maximum power that the wireless user equipment can hold. Under normal circumstances, M ^t+1 should be a non-negative value. If the current time slice does not have sufficient energy (M ^t+1 <0), the wireless user equipment will discard the current task and set M ^t+1 to 0 , re-execute the task in the next time slice.

Then establish a computing task model, this application will generate the task generated by the i-th wireless user equipment in the time slice t

It can be expressed as

in

Indicates the task

The amount of data (unit: bit),

Indicates the number of time cycles required for the CPU to process 1-bit data. In this way, when executing the task

The required execution cycle is

Define W as the bandwidth of the wireless channel, and the interference between channels can be ignored. If k wireless user equipments unload the current task at the same time in time slice t, the wireless bandwidth W will be evenly allocated to each user equipment that decides to offload.

After obtaining the energy transmitted from the wireless access point, each wireless user equipment needs to decide whether to offload the computing task to the edge server or execute it locally, so as to optimize the scheduling to reduce the delay and energy consumption of the overall task. This application adopts a complete offloading method, that is, tasks arriving in the current time slice are either executed locally on the wireless user equipment, or remotely executed on the MEC server through computing offload. use

Indicates the unloading decision variable of the i-th wireless user equipment in time slice t, where,

Indicates that the wireless user equipment chooses to offload to the edge computing server (edge computing),

Indicates that the computing task is performed locally on the wireless user equipment. The following describes the two methods respectively:

1) Local computing model:

The wireless user equipment in the mobile edge computing network of this application can obtain power wirelessly and perform local computing at the same time,

Indicates the computing capability of the i-th wireless user equipment (unit: CPU cycle/second), the computing capability of different devices is different, and the processing task

The local computing delay of

Expressed as:

energy consumed by local computing

for:

in,

Indicates the energy consumed by the i-th wireless user equipment in one CPU cycle, specifically,

It can be calculated by the following formula:

2) Edge computing model:

If the i-th wireless user equipment selects the task

offloading to the edge computing server for remote execution, then the computing offloading process can be divided into three parts: first, the wireless user equipment offloads the task data to the edge computing server through wireless transmission; then, the edge computing server allocates computing resources to the offloading The calculation of the task is completed; finally, the calculation result of the task is sent back to the corresponding wireless user equipment through wireless transmission. Since the amount of task calculation results is much smaller than the amount of task data, this application ignores the transmission delay and energy consumption caused by the download of calculation results. Therefore, the calculation offload delay from the i-th wireless user equipment to the edge computing server can be expressed as:

Edge computing server running tasks

The time is:

in,

Indicates the edge server as a task

Allocated computing resources (unit: CPU cycle/second), using F to represent the computing resources of the entire edge server, must meet the conditions:

That is to say, the total amount of computing resources allocated to all offloading tasks from the edge server should be less than the computing resource F of the entire server.

The i-th wireless user device waits locally on the edge server to perform tasks remotely

The energy consumed during the period can be expressed by the following formula:

in,

Indicates the power consumption of the i-th wireless user equipment in the idle state.

Based on the network system model established above, this application proposes an optimization cost function that minimizes the total system cost through the joint optimization of task offloading and resource allocation. The specific optimization objective problem is described as follows:

0<qt<1, ( ^d )

The optimization cost function of the entire system in the above formula is divided into two parts: the local computing cost and the cost of offloading the computing to the edge server.

and

Expressed specifically as:

Among them, ω1 and ω3 are the weights of task processing delay, ω2 and ω4 are the weights of energy consumption, and satisfy {0≤ω _i ≤1|ω _i ∈{ω ₁ , ω ₂ , ω ₃ , ω ₄ }} and ω ₁ +ω ₂ =1, ω ₃ +ω ₄ =1.

In question P

Denotes the offload decision variable for all wireless UEs,

Refers to the percentage of energy consumed by wireless user equipment to offload data to the total energy,

is a resource allocation vector, and each component represents the computing resource allocated by the edge server to each upload task. This application stipulates that if the wireless user equipment i chooses to perform tasks locally

Then the edge server will not allocate computing resources for it, that is, when

hour,

Constraint (a) indicates that the wireless user equipment either chooses to offload the task to the server or execute it locally. Constraint (b) indicates that the computing resource allocated by the edge server to any wireless user equipment performing the offloading task cannot exceed the maximum resource value. Constraint (c) ensures that the sum of allocated computing resources does not exceed the maximum resource value of the edge server. (f) It is stipulated that in the time slice t, the current power of each wireless user equipment can neither be greater than the maximum energy that the equipment can provide, nor can it be a negative value, otherwise a penalty item needs to be added.

Optionally, the system model is trained according to the optimization cost function to obtain the task offloading model. The specific method is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading model includes a first task offloading model and a second task offloading model, the system model is trained according to an optimized cost function, and the step of obtaining the task offloading model may include the following substeps:

Segment the optimization cost function to obtain the first optimization cost function and the second optimization cost function; train the system model according to the first optimization cost function to obtain the first task offloading model; Training to obtain the second task offloading model.

In detail, the original optimization problem can be decomposed into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation. Combining with Figure 4, the deep reinforcement learning method and System optimization framework for alternating direction multiplier methods.

Obviously, the solution P of the optimization function belongs to the mixed-integer non-linear programming (Mixed-Integer NonLinear Programming, MINLP) problem, that is, it is a non-convex problem. When the number of users N increases, the computational complexity of this problem increases sharply, and it is difficult to solve it directly. Therefore, considering the dependence of the four variables to be sought (x ^t , f ^t , q ^t , h ^t ) (for example, if a certain component of x ^t

is 0, then the values of the components corresponding to f ^t and h ^t are also 0. This application decomposes the problem into the following two sub-problems, and there is no dependence between the variables to be determined in each sub-problem: 1) Task calculation offloading and energy transmission (P1) of wireless user equipment, that is, how to determine x ^t , q ^t 2) edge computing server computing resources and energy allocation (P2). Once the values of x ^t and q ^t are determined, it becomes easy to solve f ^t and h ^t .

Optionally, the specific manner of training the system model according to the first optimization cost function to obtain the first task offloading model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:

A deep reinforcement learning model is established based on the system model; the deep reinforcement learning model is trained according to the first optimization cost function to obtain a first task offloading model.

In detail, for subproblem P1, the computational offloading decision optimization problem for tasks generated by each wireless user equipment is still a non-convex problem. Traditional numerical optimization methods often require a large number of iterative calculations to obtain satisfactory results, which makes them unsuitable for real-time MEC in dynamic environments where channel gain changes. Therefore, this application adopts reinforcement learning to realize real-time scheduling of computing offloading.

In a computing offloading environment where channel conditions and wireless user equipment locations change dynamically, according to subproblem P1, the system state transition probabilities of mobile edge computing networks are usually unobtainable due to the high-dimensional state space and action space, and this application is based on deep reinforcement learning The method of allows each wireless user equipment to choose whether to offload the task of time slice t arrival to the edge server according to the current system state.

The specific P1 problem can be expressed as:

First, the method based on reinforcement learning needs to define the state, action and reward function of solving the problem, as follows:

State: In each time slice t, the state of the mobile edge computing network includes: the distance between each wireless user equipment and the wireless access point d ^t and the channel gain g ^t , the data volume of each computing task currently processed b ^t , the available energy M ^t at the beginning of time slice t, ie s ^t =[d ^t , g ^t , b ^t , M ^t ].

Action: According to the definition of problem P1, it is necessary to determine the calculation offload vector x ^t and the energy transfer variable q ^t of each wireless user equipment, ie at = [x ^t , ^{q t} ^] . Based on the observed state s ^t , the method based on reinforcement learning obtains an approximate optimal mapping from the state s ^t to the action a ^t by learning the state transition strategy π of the system.

Reward function: After the value of the action a ^t = [x ^t , q ^t ] is determined, the values of f ^t and h ^t can be solved according to the ECRA algorithm. The sum of terms is minimized, and the goal of reinforcement learning is to obtain the maximum reward. Therefore, we can define the immediate reward function of the reinforcement learning algorithm as:

in,

It means that when the energy of the wireless user equipment is not enough to execute the task arriving in the current time slice (that is, M ^t+1 <0), the task should be discarded at this time, so it is necessary to introduce a penalty item to prevent such a situation from happening as much as possible. In this application, the indicator function 1{cond} is used to indicate the penalty for introducing task failure when the cond condition is met, so the penalty cost cost function is expressed as:

Among them, λ ₁ and λ ₂ are the weight of penalty, and |·| represents the absolute value.

After completing the above problem definition, this application improves the exploration strategy of complex high-dimensional action spaces based on the twin delayed deep deterministic policy gradient algorithm (twin delayed deep deterministic policy gradient, TD3), and proposes computational offloading and energy transfer based on reinforcement learning Method (RL-Based approach for Computation Offloading and Energy Transmission, RLCOET), thus avoiding the problem of slow convergence or falling into a local optimal solution due to the difficulty of fully exploring the action space.

The TD3 algorithm includes two critic networks and one action network, and the two critic networks respectively estimate two Q values (value prediction values), namely

and

The action network takes the current state as input and outputs the corresponding action. In order to speed up the learning process of the model when the dimension of the action space is high, we improved the exploration or utilization strategy of the original algorithm. The action a ^t generated by this strategy is combined with the ECRA optimization method to calculate the remaining optimization variables of the current time slice and further obtain the current reward R ^t and the state ^st+1 of the next stage, store (st ^t , a ^t , R ^t , ^st+1 ) as an experience obtained from an interaction with the environment in the experience pool, and select a The experience with a large batch loss value is used to train the neural network through the priority experience replay technology. The following are related technologies used in the RLCOET algorithm:

1) Generation and selection of action candidate solution sets:

Since the action a ^t =[x ^t , q ^t ] output by the action network of the RLCOET algorithm belongs to a high-dimensional space, it has a total of N+1 dimensions. The method of directly introducing Gaussian noise to explore the action space only uses a small number of action variables. In the high-dimensional space, it is difficult to make the neural learn the optimal strategy through effective exploration, so we improved the exploration in the action space. Strategy. See Figure 5, the action network has two branches: one is used to predict the energy transfer ratio q ^t , which is a one-dimensional continuous variable between 0 and 1, so this item introduces Gaussian noise during action exploration and evaluates the result Clipping, so that it also remains between 0 and 1; the other part x ^t is an N-dimensional discrete vector, and the search space for the solution is 2 ^N . The output of the action network is a continuous slack decision variable

Generate K discrete decision-making actions using the order-preserving quantization method

The order-preserving quantization method has the advantage of balancing the computational complexity and model performance of the model, and can realize an extensive search of the x ^t action space when K is small. For each unloading decision vector generated

Combine the f ^t and h ^t instant reward functions calculated by the ECRA algorithm to calculate the reward values of the current K candidates

which selects the highest

The action variable corresponding to the value is used as the current optimal unloading decision-making behavior, denoted as

which is:

2) Priority experience playback

The experience (st ^t , at ^t , R ^t , st ^t+1 ) obtained by the RLCOET algorithm each time it interacts with the system environment is stored in the experience pool, where at and R ^t are the best action ^sums in action generation and selection award. During model training, we draw a batch of experience samples from the experience pool to update the action network and critic network. Different from the random sampling training neural network in the common reinforcement learning, this application adopts the priority experience playback technology, sets up the experience pool with the SumTree structure, and sorts the samples according to the priority. If the loss value of the sample is higher, the priority is higher. It is more likely to be selected to update the network parameters, which can train the network more effectively and accelerate the convergence of the model. In order to prevent overfitting caused by frequent selection of some samples for training, and the problem that the network is prone to outliers in the early training process, randomness is added to the selection of samples, so that samples with lower priority may also be selected. Selected, the probability that sample i is selected is:

where p _i is the priority of sample i, and υ is the number of priorities used.

3) Strategy update:

Let the parameters of the actor network and the corresponding target actor network be denoted as η and η′ respectively, and the parameters of the critic network and the corresponding critic target network be denoted as δ _i and δ′ _i , i={1, 2}, since The output Q values of the two critic networks are different, and the smaller of the two Q values is selected as the update target of the network, namely:

Among them, one with

The associated critic network is used for updating, and y ^t is

and

The update target for .

Since the initial values of the network parameters are different, at the beginning of the network training, the smaller value predicted by the two critic networks is selected to estimate the Q value to prevent the bias caused by overestimating the Q value, because a Small errors, when the network is updated many times, the errors can accumulate and lead to poor performance. In addition to using delayed policy updates to avoid excessive accumulation of bias, this application also performs numerical smoothing on the neighborhood around the target action space to reduce errors, that is, adding a certain amount of noise ζ in the target action network.

Among them, the noise ζ can be regarded as a kind of regularization, which makes the update of the value function more stable, and makes the predicted value of the target Q value Q _target more accurate and robust.

Estimates of the critic network

Approximating the target network y ^t , their loss function L is calculated as follows

Since the action a ^t contains a discrete vector (x ^t ) and a continuous variable (q ^t ), the network loss function also contains two parts. For the variable ^qt , the gradient of the loss function is derived to update the parameters of the action network as follows:

where N _m is the number of samples selected from the prior experience replay experience pool, and for the offload vector x ^t , the average cross-entropy loss is used to update the parameter η of the action network:

Among them, x ^t is the unloading vector part of a ^t . In summary, the total loss function for updating the action network is:

where _λg is the weight of the variable ^qt loss term.

Optionally, the specific manner of training the system model according to the second optimization cost function to obtain the second task offloading model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the following sub-steps may be included:

An alternating direction multiplier method model is established based on the system model; the alternate direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.

In detail, for the sub-problem P2, since there are a large number of constraints on the variables to be sought in P2, it is difficult for reinforcement learning to obtain an ideal strategy within a limited time. After solving the problem P1, the original problem P becomes a convex optimization problem , then the traditional convex optimization algorithm can be used to solve it. This application proposes an Energy and Computation-Resource Allocation (ECRA) algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve P2, and its time complexity is only O(N).

That is to say, the computing resource size and energy allocation ratio of each task uploaded to the edge server can be obtained by using the alternating direction multiplier method. According to the RLCOET reinforcement learning algorithm, the optimization variables x ^t and q ^t of the problem P1 can be obtained. In this step, ADMM-based method is adopted to solve problem P2. The ADMM method is a computational framework for solving optimization problems, which is suitable for solving large-scale distributed convex optimization problems. ADMM decomposes a large global problem into multiple smaller and easy-to-solve sub-problems through "decomposition-coordination" processing, and coordinates the solutions of each sub-problem to obtain the solution of the overall global problem. This method makes up for the shortcoming that the coefficient of the penalty term near the optimal solution tends to infinity. In order to transform the original optimization objective problem P into a form that is easy to solve by the ADMM method, two additional variables ψ ^t and

And thus the ECRA algorithm is proposed, and the converted problem P2 can be expressed as:

when

hour,

and

The value of is independent of the devices participating in the local execution. P2 is transformed into a constrained optimization problem involving two types of variables. This structure can easily handle the regularization term in the optimization objective. P2 is solved using the ADMM algorithm and the augmented Lagrangian method, as follows:

Wherein, α={ft ,h ^t }, β={ψ ^t ,z ^t }, ∈={θ ^t , ^{τ t} ^} . The penalty item coefficient ρ (ρ>0) is a fixed value. The above optimization problem is solved by gradually iteratively updating the values of α, β, ∈. Assuming that the variables in the jth round are α ^j , β ^j , ∈ ^j , then the steps to update each variable in the j+1th round are as follows:

1) Given the variable {β ^j , ∈ ^j } of the jth round, update α ^j+1 by minimizing the value of the above formula, namely:

Among them, there is a summation operation over N in L _ρ (α, β ^j , ∈ ^j ), so it can be decomposed into N sub-problems for parallel computing. Each sub-problem can be expressed as:

In this way, the above formula can be transformed into a convex optimization problem that satisfies the constraints, and its solution can be obtained through traditional optimization algorithms. Therefore, according to the solution of N subproblems, we can get the value of ^αj+1 . The computational complexity of the above formula is O(1), and the total complexity of the N problem is O(N).

2) After obtaining the value of α ^j+1 in the previous step, the value of β can be updated to minimize L(α, β, ∈) given α ^j+1 and ∈ ^j , and this step can be The optimization problem of is expressed as:

The computational complexity of this problem is O(N).

3) After calculating the values of α ^j+1 and β ^j+1 , update the value of ∈ ^j+1 by minimizing L(α, β, ∈), as shown in the following formula:

Specifically, the computational complexity of this problem is O(N).

The above three steps are performed iteratively until the following two conditions are met: absolute error

and relative error

are less than a given threshold. Based on the method of ADMM, the problem P2 can be solved by the ECRA algorithm shown in Figure 6 and the convergence of the algorithm can be guaranteed, and its convergence is related to ρ. According to the above analysis of the computational complexity of each step, the complexity of the total algorithm is O( N). It is worth noting that since the original problem is non-convex, although there is no guarantee that the algorithm can find the optimal solution to the original problem, the error between the approximate solution and the optimal solution obtained is within a controllable range.

Finally, according to the calculation results of the deep reinforcement learning model and the alternating direction multiplier method model, an effective optimization algorithm can be proposed to train the model until it meets the requirements, and the task offloading model can be obtained.

It should be noted that the entire scheduling optimization method based on reinforcement learning can be represented by Figure 7, which corresponds to the steps of training the deep reinforcement learning model and the alternating direction multiplier method model. First, initialize the parameters of the critic network and action network parameters, initialize the parameters of the reinforcement learning critic target network and action target network and the experience data of the experience pool, and initialize the parameters of the large-scale UAV-assisted mobile edge computing network model and the neural network The number of training rounds t=1. Secondly, judge whether the current random probability is less than the preset value, if so, directly output the current action, if not, quantize K sets of candidate solutions, add Gaussian noise to the action, and select the optimal action. Then, calculate resource and energy allocation optimization variables according to the ECRA algorithm, obtain the next stage state and immediate reward, and store the experience in the experience pool, draw a batch of experience from the experience pool according to the priority experience playback strategy, and update the neural network parameters, t =t+1, judge whether t is less than T, if so, re-judge the random probability, if not, end.

Regarding step S320, it should be noted that the specific manner of obtaining the task offloading policy is not limited, and can be set according to actual application requirements. For example, in an alternative example, the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, the task to be processed is input into a preset task offloading model, and the step of obtaining the task offloading strategy may include the following sub-steps :

Enter the pending tasks into the first task offloading model to obtain the first task offloading strategy; input the pending tasks into the second task offloading model to obtain the second task offloading strategy.

Wherein, the first task offloading strategy may include the computing offloading decision variable of each wireless user equipment and the proportion of time spent on wireless charging of the device, and the second task offloading strategy may include the computing resource size and energy allocated for each task uploaded to the edge server distribution ratio.

That is to say, the embodiment of the present application provides an efficient online offloading method in a large-scale mobile edge computing network, including the following sub-steps:

Step 1. Construct a system model for a large-scale mobile computing network and provide an optimization objective function based on the wireless charging device offloading task execution delay and energy consumption.

Step 2. Decompose the original optimization problem into two sub-problems: 1) task calculation offloading and energy transmission of wireless user equipment and 2) edge computing server computing resources and energy allocation, and respectively design the sub-problems based on deep reinforcement learning method and alternating direction multiplier method System optimization framework.

Step 3. Aiming at sub-problem 1 in step 2, a method based on deep reinforcement learning is proposed to obtain the ratio of computing offloading decision variables of each wireless user device to the time spent on wireless charging of the device.

Step 4. For sub-problem 2 in step 2, use the alternating direction multiplier method to obtain the size of computing resources allocated to each task uploaded to the edge server and the energy allocation ratio.

Step 5. According to the calculation results of Step 3 and Step 4, an effective optimization algorithm is proposed to train the model until the requirements are met.

This application uses a brand-new computing offloading method for mobile edge computing networks. The proposed RLCOET algorithm can obtain an efficient offloading strategy by learning and interacting with wireless user equipment movement in a dynamic edge computing network environment. Compared with traditional optimization methods, the method of the present application alleviates the requirement of solving scheduling optimization through repeated iterative calculations, and enables all tasks to obtain satisfactory calculation delay and lower energy consumption. Compared with most of the existing learning-based methods, all scheduling variables are optimized together, which may face convergence troubles when there are many variables to be solved. This algorithm decomposes the entire optimization problem into two sub-problems (computation offloading and energy transfer , computing resources and energy allocation) and solve them separately, which effectively reduces the complexity of the algorithm. By improving the optimal action variable generation strategy and experience sampling strategy of the deep learning algorithm, the proposed algorithm is easy to converge, and a near-optimal computation offloading strategy is obtained in MEC networks with large-scale scheduling variables.

The task offloading method according to one aspect of the present application is based on a mobile edge computing network. However, when the network infrastructure is unavailable (such as a natural disaster rescue site), network equipment is sparsely distributed (such as a field operation environment), or when facing a temporary surge of mobile devices far beyond the network service capacity (such as a large game or rally ), in view of the high maneuverability and flexibility of UAVs (Unmanned Aerial Vehicles, UAVs), UAVs can be used as communication relay stations or edge computing platforms. In recent years, researchers have established a communication relationship with users' mobile devices (Mobile Devices, MDs) by deploying relevant wireless communication nodes on UAVs, and proposed the use of UAVs to assist mobile edge computing (Mobile Edge Computing) in various application scenarios. Edge Computing, MEC) technology. After the computing resources are deployed by drones, the drone-assisted mobile edge computing network will bring many advantages, such as reducing network overhead, reducing computing task execution latency, better quality of experience (QoE), and extending battery life of mobile devices Wait.

In the field of UAV-assisted mobile edge computing, it is necessary to properly determine the trajectory of the UAV and the offloading of computing tasks in the mobile edge computing network (whether the computing task is executed locally on the mobile device or offloaded to the edge server). decision to obtain desired performance. Specifically, existing research and inventions minimize the computing delay or energy consumption of all mobile devices by optimizing the UAV trajectory, task offloading ratio, and task scheduling to ensure the reliability of the entire edge computing network.

Existing UAV-assisted edge computing systems often only use one or more UAVs as edge computing devices to ensure low latency and reliability of network system computing task transmission. Due to the limitations of the current development of UAV technology and the weak computing power of computing devices deployed in UAVs, it is not enough to use UAV-assisted edge computing networks to provide satisfactory services for multiple mobile devices.

Therefore, a more promising model is to realize the construction of mobile edge computing network among mobile devices, drones and cellular network base stations (cellular base stations, BS). However, some existing edge computing networks composed of mobile devices, UAVs and base stations only contain one UAV. As a result, the computing task requirements of multiple mobile devices cannot be satisfied at the same time, and the task computing delay of the network system is increased.

The following describes a data processing system, a scheduling optimization system, and a scheduling optimization method according to another aspect of the present application in the case of implementing a mobile edge computing network among mobile devices, drones, and cellular network base stations with reference to the accompanying drawings.

FIG. 8 is a structural block diagram of a data processing system 10 provided by other embodiments of the present application, which provides a possible implementation of the data processing system 10. Referring to FIG. 8, the data processing system 10 may include an electronic device 100, a scheduling One or more of system 300 are optimized.

Wherein, the electronic device 100 communicates with the scheduling optimization system 300, and the electronic device 100 obtains the tasks and locations to be processed by the scheduling optimization system 300, and obtains a scheduling strategy according to the tasks and locations to be processed, so that the scheduling optimization system 300 can perform scheduling optimization according to the scheduling strategy deal with.

Optionally, the specific composition of the scheduling optimization system 300 is not limited, and can be set according to actual application requirements. For example, in an alternative example, the scheduling optimization system 300 may include at least one base station, a drone, and a mobile device.

It should be noted that, in an alternative example, the electronic device 100 and the mobile device may be the same device; in another alternative example, the electronic device 100 and the drone may be the same device; in another In an alternative example, the electronic device 100 and the base station may be the same device.

Optionally, the number of base stations is not limited, and can be set according to actual application requirements. For example, in an alternative example, the number of base stations may be one.

That is to say, in order to solve the problem that the task calculation delay of the edge computing network composed of mobile devices, UAVs and base stations is high, and it cannot satisfy multiple mobile devices with computing task requirements at the same time, in combination with Figure 9, this application establishes a network consisting of A mobile edge computing network composed of a single base station, multiple drones, and a large number of mobile devices. Computational tasks generated by mobile devices in the network can either be performed on the mobile device itself, offloaded to one of the drones for simple calculations, or further transmitted to the base station for more intensive calculations.

FIG. 10 shows one of the flowcharts of the scheduling optimization method provided by the embodiment of the present application. The method can be applied to the electronic device 100 shown in FIG. 19 (described below), and is executed by the electronic device 100 in FIG. 19 . It can be understood that the scheduling optimization device according to the embodiments of the present application may be implemented by the task offloading device according to some embodiments of the present application. In addition, it should be understood that in other embodiments, the order of some steps in the scheduling optimization method of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted. The flow of the scheduling optimization method shown in FIG. 10 will be described in detail below.

Step S410, acquiring the pending tasks and current location information of at least one mobile device.

Wherein, the tasks to be processed include the first task and the second task.

In step S420, the task to be processed and the current location information are input into a preset scheduling optimization model to obtain a scheduling strategy.

Wherein, the scheduling optimization model is obtained by training based on the established initial model.

Step S430, sending the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one UAV through at least one UAV. base station for processing.

The above method obtains a scheduling strategy by inputting the pending tasks and current location information into a preset scheduling optimization model, and sends the scheduling strategy to at least one mobile device, so that at least one mobile device sends the first task to at least one mobile device based on the scheduling strategy. Man-machine processing, the second task is forwarded to at least one base station for processing through at least one UAV, and the first task is dispatched to the UAV for processing, and the second task is dispatched to the base station for processing, avoiding correlation In the technology, the tasks are all executed locally on the mobile device, or they are all dispatched to the UAV or the base station for remote execution, which leads to the problem of low efficiency of scheduling optimization.

It should be noted that before step S410, the scheduling optimization method provided by the embodiment of the present application may also include the step of obtaining a scheduling optimization model. Referring to FIG. 11, this step may include the following sub-steps:

Step S440, establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system.

In step S450, the initial model is trained according to the optimization objective function to obtain a scheduling optimization model.

Regarding step S440, it should be noted that the specific ways of establishing the initial model and optimizing the objective function are not limited, and can be set according to actual application requirements. For example, in an alternative example, step S440 may include the following sub-steps:

An initial model is established according to the initial parameters of at least one base station, unmanned aerial vehicle and mobile device; an optimization objective function is established according to the initial model.

Among them, the initial model may include the system model, calculation model and communication model of the mobile edge computing network system, and the step of establishing the initial model may include the following sub-steps:

1. Establish a system model:

The network architecture of the system model established in this application is mainly divided into three layers, mobile devices on the ground, drones in the air, and remote base stations. The positions of the three can be represented by a three-dimensional Cartesian coordinate system. The total execution time of the task to be processed is recorded as T, which is evenly divided into N time slices, and the time slice set can be expressed as:

Among them, the length τ of each time slice satisfies τ=T/N, and assuming that each time slice is small enough that the position of each UAV in the time slice remains unchanged, considering that there may be congestion in computing tasks Next, this network system assumes that mobile devices cannot directly communicate with the base station, and can only offload tasks to the base station with the help of drones.

In a network system, a collection of mobile devices can be expressed as:

Among them, M represents the number of mobile devices, and the position of mobile device MD ^m in the time slice TS ^m can be expressed as:

in,

and

Indicates the coordinates of the horizontal plane where the mobile device MD ^m is located,

In the time slice TS ^m , each mobile device MD ^m will generate a computationally intensive task, which can be expressed as:

in,

Indicates the current task

The size of the data (unit: bit),

Indicates the number of cycles spent by the CPU processing each bit, and T ^req indicates the current task

The maximum time allowed for execution. Without loss of generality, the maximum allowed execution time is the same for all tasks. In addition, the value of T ^req is smaller than τ to ensure that each task can be executed in one time slice.

An onboard CPU is embedded in each mobile device MD ^m , and its maximum computing frequency can be used

express. By dynamically adjusting the voltage and frequency of the CPU, in the time slice TS ⁿ , the actual CPU frequency of the mobile device MD ^m

Able to realize adaptive control to improve energy utilization efficiency, therefore,

Should meet:

which assumes that all mobile devices have the same maximum computing power

In this system, the set of drones can be expressed as:

Among them, U represents the number of UAVs, and the position of UAV ^u in time slice TS ⁿ can be expressed as:

in,

and

Indicates the coordinates of the horizontal plane where the UAV ^u is located,

H represents the height of the drone.

Assuming that the maximum flight speed of each UAV does not exceed V _max , it can be expressed as:

where v _u (n) denotes the velocity of UAV ^u in time slice TS ⁿ . In addition, in order to ensure the flight safety of drones, the distance between any two drones should be greater than the minimum allowable distance d _min , namely:

The energy consumption of UAV ^u in time slice TS ⁿ can be expressed as:

in,

M _g denotes the weight of the UAV ^u .

Each UAV can be deployed as an edge server, and its maximum computing power is recorded as

In the time slice TS ⁿ , for the computing tasks that are determined to be uploaded to the UAV and executed, the CPU computing resources allocated by the UAV ^u can be expressed as

and satisfy:

Assume all drones have the same maximum computing power

The location of the base station can be expressed as:

Wherein, x ^BS and y ^BS represent the coordinates of the horizontal plane where the base station is located. Due to the high height of the base station and the drone, the base station and the drone are connected through a line-of-sight wireless transmission link and are not directly connected to the mobile device. In this case, the UAV acts as a relay forwarding device, forwarding the tasks offloaded by the mobile device to the base station for further calculation. Since the base station has a powerful computing server and energy supply, the execution time of the computing task at the base station is negligible, and the energy consumption of all tasks performed on the base station is not considered.

The offloading method of all computing tasks in this system follows the method of complete offloading, that is, each computing task is either completely executed locally, or completely offloaded to the UAV ^u , or further completely offloaded to the base station for execution. Using Task Scheduling Decision Variables

represent computing tasks

Uninstallation of:

in,

represent computing tasks

To offload to computing platform k.

It is worth noting that when the task

When executed on a mobile device MD ^m or drone, there is only one

The value of is 1, and the other values are 0, that is,

or

When task

When executed on the base station, except

In addition, the corresponding drone to which it is unloaded also needs to be 1, that is

Because one of the drones is supposed to act as a relay from the mobile to the base station. In summary, variable

The following constraints should be met:

In addition, it is assumed that each UAV can offload at most one task to the BS to continue execution in each time slice. Therefore,

Should satisfy:

in,

What needs to be added is that due to the introduction of variables

The constraints on the computing resources allocated by mobile devices and UAVs become:

2. Establish a calculation model:

Computing tasks can be performed in mobile devices, drones, and base stations, so they can be called local computing, drone-side computing, and BS-side computing, respectively. if task

Choose to compute locally, that is,

Then, the calculation time of the task is:

The energy consumed is:

where κ ^m and v _m are positive coefficients depending on the CPU in mobile device MD ^m .

if computing tasks

Choose to offload to the drone UAV ^u to execute, that is

The calculation time of the task is:

in,

The corresponding energy consumption is:

in,

where κ ^m and v _m are positive coefficients depending on the CPU in UAV ^u , it is worth noting that each computation task

Can only be unloaded into one of the drones.

if task

Executed at the base station, that is

According to the assumption of the strong computing power and energy supply capability of the base station, the execution time of this task is approximately zero, and the energy consumption generated by the task is not considered.

3. Establish a communication model:

The communication link of the entire network system is divided into two types: the communication link between the mobile device and the UAV, and the communication link between the UAV and the base station. In order to avoid possible communication interference between UAVs, each UAV is assigned an orthogonal communication frequency. Due to the high altitude of UAVs, the wireless communication channel between UAVs and mobile devices or base stations is mainly based on Mainly line-of-sight wireless transmission.

In time slice TS ⁿ , the distance between mobile device MD ^m and UAV ^u is:

In the time slice TS ⁿ , the distance between the UAV ^u and the base station is:

Therefore, the wireless channel gain between the mobile device MD ^m and the UAV ^u is:

The wireless channel gain between the UAV ^u and the base station is:

Among them, g _o is the received power gain at the reference distance of 1 meter.

if computing tasks

Choose to offload from mobile device MD ^m to unmanned aerial vehicle UAV ^u , the transmission rate of mission data is:

if computing tasks

Choose to offload from UAV ^u to the base station, the transmission rate of mission data is:

Among them, B represents the bandwidth of the network system,

and

Respectively represent the wireless transmission power of the mobile device MD ^m and the UAV ^u in the time slice TS ⁿ , σ ² represents the communication noise frequency,

and

The following conditions are met respectively:

in,

and

Respectively represent the maximum available transmission power of the mobile device MD ^m and the UAV ^u .

The time and energy consumed by the mobile device MD ^m to offload computing tasks to the UAV ^u are:

The time and energy consumed by the unmanned aerial vehicle UAV ^u to offload computing tasks to the base station are:

make

and

represent the energy budgets of the mobile device MD ^m and the UAV ^u , respectively, and for

The following constraints are met:

The optimization goal of this network system is to minimize the total energy consumption of mobile devices and UAVs under task delay constraints and system constraints (such as the maximum speed of UAVs, the minimum distance between UAVs and the maximum computing power). computing tasks

When mobile devices, UAVs or base stations are executed, the corresponding task delays are expressed as follows:

When the task scheduling decision variable is introduced

After that, the calculation task

can be uniformly expressed as:

Therefore, the corresponding task execution delay constraint is:

In time slice TS ⁿ , execute the task

The resulting energy consumption can be divided into two categories:

1) If the task

Executed locally on the mobile device MD ^m , ie

Then the energy consumption of the mobile device MD ^m is

2) If the task

be offloaded to UAV ^u or base station for execution, i.e.

Then the energy consumption of the mobile device MD ^m is

Therefore, the mobile device MD ^m is performing the computing task

The energy consumed can be uniformly expressed as:

The energy consumption of all mobile devices during task execution can be expressed as:

In summary, in order to minimize the total energy consumption of all tasks of mobile devices during the operation of the mobile edge computing network system, the optimization problem (optimization objective function) is defined as follows:

s.t. C1: Eq(1) and Eq(2),

C2: Eq(3), Eq(4) and Eq(5),

C3: Eq(6) and Eq(7),

C4: Eq(8),

C5: Eq(9) and Eq(10),

C6: Eq(11),

in,

is the variable to be optimized.

In problem P, constraint C1 states that the maximum speed of the drones and the minimum distance between drones should not violate the corresponding constraints. Restriction C2 guarantees that the computing tasks generated by a certain mobile device in each time slice can only be executed on one of the local mobile device, UAV or base station, and each UAV can be executed in each time slice At most one task can be sent to the base station. Constraint C3 ensures that the computing resources allocated to local computing and UAV computing in each time slice should not exceed the maximum computing capabilities of mobile devices and UAVs respectively. Constraint C4 states that mobile devices and drones should not exceed their corresponding energy budgets during execution. Constraint C5 states that the transmit power allocated by mobile devices and drones cannot exceed the maximum allowable value. Constraint C6 ensures that the execution of each task should meet the delay requirement.

Regarding step S450, it should be noted that the specific manner of training the model is not limited, and can be set according to actual application requirements. For example, in an alternative example, the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model, and a resource allocation model, and step S450 may include the following sub-steps:

The optimization objective function is split and processed to obtain the first optimization objective function, the second optimization objective function and the third optimization objective function; the initial model is trained according to the first optimization objective function, and the UAV trajectory planning model is obtained. The second optimization objective function trains the initial model to obtain a joint scheduling model of computing tasks, and trains the initial model according to the third optimization objective function to obtain a resource allocation model.

In detail, the problem P is a difficult problem, the main reasons are as follows: 1) Since A is a discrete binary variable, and L, P, F are continuous variables, the problem is a mixed nonlinear integer The planning problem is an NP-hard problem; 2) Due to the fast response requirements of the network system, each time slice scheduling optimization algorithm should make real-time and fast scheduling decisions; 3) Since the positions of mobile devices and UAVs will change, P needs Can be solved in a dynamically changing environment. Based on the above reasons, this application decomposes the optimization objective function P into three sub-problems, including UAV trajectory planning (P1, the first optimization objective function), joint scheduling of computing tasks (P2, the second optimization objective function) and calculation And transmission resource allocation (P3, the third optimization objective function), so that an efficient mobile edge computing network scheduling strategy can be obtained, which greatly reduces the complexity of solving the optimization problem.

In order to reduce the computational complexity of the original optimization problem, P is split into the following three sub-problems:

1. UAV trajectory planning:

Among the optimal scheduling variables L, A, P, and F in the problem P, the trajectory position L of the UAV is weakly dependent on the other three variables. The optimization of this variable is mainly based on the position observation of the mobile device. The optimized The goal is to be as close as possible to the mobile device and the base station, therefore, the UAV trajectory optimization can be expressed as:

s.t. C1: Eq(1) and Eq(2), ;

in,

Indicates a cluster composed of mobile devices within the service range provided by UAV ^u , and meets the conditions

2. Joint scheduling of computing tasks:

Once the position L of the UAV is determined in the time slice ^TSn , the task offloading decision variable A needs to be optimized before optimizing the variables P and F. Based on the current mobile device cluster

To minimize the maximum computational delay of all tasks

Optimizing A for the goal makes it easier to satisfy the constraint C6 in the original problem P, so the joint scheduling subproblem of computing tasks can be expressed as:

s.t.C2: Eq(3), Eq(4) and Eq(5),

3. Allocation of computing and transmission resources:

After solving problem P1 and problem P2, under the constraints of C3, C4 and C5, the remaining variables P and F are optimized as follows with the goal of minimizing energy consumption in the system:

s.t. C3: Eq(6) and Eq(7),

C4: Eq(8),

C5: Eq(9) and Eq(10),

Based on the decomposition of the above problems, the optimization framework proposed by this application is shown in Figure 12. This algorithm framework consists of a UAV trajectory planning model (UAV Trajectory Planning, UTP), a computing task joint scheduling model (Task Association Scheduling, TAS) and The calculation and transmission resource allocation model (Resource Allocation, RA) consists of three models, which correspond to the optimization sub-problems P1, P2 and P3 respectively. At the beginning of each time slice, the network system environment generates two state variables (

and

).

is the input of the UTP model,

is the input of TAS and RA models.

1) UTP model pairs

For processing, since the location of the mobile device is different in different time slices, the UTP model will predict the movement of the mobile device and guide the UAV to move to an appropriate position. Since the motion mode of the mobile device neither conforms to the Gaussian distribution nor the linear distribution, this application can use the long short-term memory network to simulate the motion distribution of the mobile device. After the prediction is completed, the drones need to be properly divided into U clusters according to the number of drones, so that each drone can serve the mobile devices in the cluster. For soft clustering, i.e. each mobile device can be served by different drones in different time slices (but not more than one drone in the same time slice), the fuzzy C-means is adopted in the UTP model The clustering method performs clustering according to the similarity of channel power gains. After clustering, the center point of each cluster is used as the output of the movement position of the UAV in the UTP module, namely

2) The TAS model is received from the UTP model and the network environment respectively

and

The TAS model generates task scheduling decision variables according to time-varying channel conditions and computing task requirements

value. This application can use the advanced deep reinforcement learning (DRL) method: deep deterministic policy gradient algorithm (Deep Deterministic Policy Gradient, DDPG), according to the interaction between the algorithm model and the environment to obtain experience and output the optimized decision-making action a _n . In other alternative examples, other reinforcement learning algorithms suitable for continuous actions (such as TD3 algorithm, PPO algorithm, etc.) can also be used. For each time slice, the output action a _n is a one-dimensional vector given by

items, each of which is set to be a continuous variable that is relaxed between 0 and 1. Each term of a _n can be viewed as

Compute the probability of execution on device k (this is why each item is set to a continuous value between 0 and 1). Since the task scheduling decision variables should be two-dimensional, binary values, the values of all items of a _n are shaped and integrated as 1 or 0 according to the task association constraints of the optimization problem, and are used as the output of the TAS model, namely

3) Will

and

As input to the RA model for final processing. According to the sub-problem P3, the optimization variables P and F can be solved directly through the CVXPY convex optimization toolkit, and the P and F output by the RA model interact with the environment.

The environment receives the action output by the above three models, and the environment receives the action and generates a reward r _n (as the input of DDPG) and a new state (the components corresponding to the state are sent to the corresponding components of the algorithm framework). Thereafter, the algorithm enters the next time slice and repeats the above three steps.

It should be noted that the optimal location plan of the UAV can be calculated by the method of long-term short-term memory network and fuzzy C-means clustering. The trajectory planning of UAV can be divided into mobile device motion prediction and mobile device clustering two parts.

In the network system, the distance between the UAV and the mobile device is the main factor affecting other scheduling variables, so the ideal trajectory of the UAV is to gradually move towards the mobile device and get as close as possible to the mobile device. To this end, the algorithm proposed in this application predicts the location of the mobile device

To assist the movement of the drone. due to

The prediction of is mainly based on the position of the mobile device in the previous time slice, so this application uses the recurrent neural network LSTM to simulate

time series distribution.

As shown in Figure 13, the Long-Short Term Memory (LSTM) is a recurrent neural network that accepts external input

and feedback inputs (C _n-1 and h _n-1 ). The output of LSTM includes two items (C _n and _hn ), which are input to LSTM itself for processing in the next time slice. Of these two output terms, _Cn is obtained by:

Among them, f _n , i _n and

Represents the output value of the neural network, σ and tanh represent the sigmoid and hyperbolic tangent activation functions respectively, W _f , W _i and W _C represent the network weights of the corresponding neural network layer, b _f , b _i and b _C represent the corresponding neural network Offset vector, these two parts are the parameters that the neural network needs to learn.

Based on C _n , h _n is calculated by the following formula:

h _n = o _n *tanh(C _n );

Among them, W _O and b _o are the parameters that the neural network needs to learn.

Based on the above formula, this application proposes an LSTM-based mobile device location observation model to predict the location of the mobile device, and its time series expansion is shown in FIG. 14 . In each time slice, the current location of the mobile device is input to the LSTM network, and the LSTM outputs h _n . In order to predict the location of the mobile device in the next time slice, a fully connected layer is also added to the output to fine-tune h _n as follows:

Among them, relu is the relu activation function,

and

The variables that need to be learned for the training of the neural network.

Location Prediction Based on Mobile Device's Next Time Slice

Mobile devices need to be clustered into U groups to ensure that UAVs can provide services for them in a load-balanced manner. In order to complete the clustering of mobile devices, the FCM method can be used to start from the fuzzy theory, for each cluster

The mobile device MD ^m assigns a metric value d _m,u in the time slice TS ⁿ⁺¹ , and its calculation method is as follows:

Among them, c _u represents the position of the UAV in the nth time slice, c _k represents the center point of the kth cluster, namely

By minimizing the objective function O to be optimized, iteratively solve the values of d _{m, u} and c _k until the difference between the two continuously calculated metric values is less than the specified threshold ε _c :

Before iterating, all c _u should be initialized, each c _u using

, because mobile devices can only move within a small range, their new center points may be close to the previous center points (these center points are planned as the position of the drone movement

).

At the end of the iteration, each mobile device MD ^m is assigned a metric d _m,u representing its membership in the u-th cluster, d _m,u can be further adjusted to a binary clustering decision by an exploration strategy , which can reduce the possibility of getting stuck in a local minimum of the optimization objective O. Using _εc to denote the exploration threshold, mobile devices MD ^m are clustered with probability 1− _εc to the cluster with the largest metric value, and to other clusters with probability _εc . The algorithm in Figure 15 describes in detail the clustering process of mobile devices based on FCM in the nth time slice. The output c _u of Algorithm 1 guides the UAV to move to

It should be noted that the task scheduling decision variables of each mobile device can be obtained by using the deep deterministic policy gradient algorithm based on reinforcement learning. The joint scheduling of computing tasks includes two parts: DDPG-based task scheduling decision variable optimization and scheduling variable integration. . After the trajectory of the UAV is known, the algorithm framework uses the reinforcement learning algorithm of DDPG to learn the scheduling strategy of computing tasks, namely:

Policy π is a mapping function from environment state to decision-making action, and the state of the network environment is:

The decision-making action output by strategy π is:

Each component of a _n is a continuous variable from 0 to 1, whose magnitude is:

With reinforcement learning, an approximately optimal solution to policy π can be obtained by maximizing the total utility value (also known as the Q value):

Among them, s _n ₊₁ is the new state of the environment after decision-making action a _n is taken in state s n, r _n (s _n , a _n )=(ε _n ) ^-1 is the immediate reward of time slice TS ⁿ , γ is Discount factor for future rewards. Since the state and action space of the environment are high-dimensional, two neural networks are used: actor neural network (Actor) π (parameter is ω) and critic neural network (Critic) Q (parameter is θ), as shown in Figure 16 shown. In order to make the learning process more stable, the target network can be used (the target policy network body and the target evaluation network are respectively represented by

and

as parameters) to update the parameters periodically.

In time slice TS ⁿ , the environment transitions from state s _n to state s _n+1 after accepting the action a _n output by the algorithm model, and generates a reward r _n , packing these four items into a tuple ( s _n , a _n , s _n+1 , r _n ) and stored in an experience playback pool. During the algorithm training process, a batch of samples is randomly selected from the experience playback pool, and the evaluation neural network (ie, parameter θ) is trained according to the following loss function.

The actor network minimizes the following gradient function for parameter training:

in,

is the state sampled from the state distribution under the current policy π,

is the number of samples in a batch during network backpropagation training. The DDPG-based task joint scheduling training algorithm is shown in Figure 17 for details.

Since the decision-making action a _n output by the actor network is a one-dimensional vector, and each item of a _n is a continuous value ranging from 0 to 1, it is necessary to reshape a _n in a two-dimensional manner (reshape ), and integrated into 0 or 1 for further task scheduling. As shown in Figure 18, it is the shaping and integration algorithm of a _n , and the time complexity of the algorithm is

After the shaping and integration of the above task scheduling variables, the output a[m][k] of Algorithm 3 is passed to the RA module for resource optimization allocation.

It should be noted that a method based on convex optimization can be used to determine the allocation of computing and transmission resources in the network system.

and

As input to the RA module for final processing. According to the sub-problem P3, the optimization variables P and F can be directly solved by using the convex optimization method through the CVXPY tool.

Regarding step S420, it should be noted that the specific manner of obtaining the scheduling policy is not limited, and can be set according to actual application requirements. For example, in an alternative example, step S420 may include the following sub-steps:

Input the current location information into the trajectory planning model of the UAV, and calculate the predicted location information of at least one mobile device; input the pending tasks and predicted location information into the task joint scheduling model, and calculate the task scheduling decision variable of at least one mobile device; The pending tasks and task scheduling decision variables are input into the resource allocation model, and the scheduling strategy is calculated.

Wherein, the specific method of inputting the current location information into the trajectory planning model of the UAV to calculate the predicted location information of at least one mobile device is not limited, and can be set according to actual application requirements. For example, in an alternative example, this step may include the following sub-steps:

Perform motion prediction processing according to the current location information to obtain the next location information of at least one mobile device; perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.

It should be noted that the steps of performing prediction processing and clustering processing can refer to the process of obtaining the UAV trajectory planning model through training above.

The specific manner of inputting the pending tasks and predicted location information into the task joint scheduling model to calculate the task scheduling decision variables of at least one mobile device is not limited, and can be set according to actual application requirements. For example, in an alternative example, this step may include the following sub-steps:

The task joint scheduling training process is performed according to the pending tasks and the predicted position information, and the decision-making action of at least one mobile device is obtained; the decision-making action is integrated, and the task scheduling decision variable is obtained.

It should be noted that the steps of performing training processing and integration processing can refer to the above-mentioned process of training and obtaining a joint scheduling model for computing tasks.

Through the above method, this application deploys a mobile edge computing network consisting of a single base station, multiple UAVs and a large number of mobile devices. Each computing task can be executed on the mobile device or offloaded to the UAV. Computing, or further offloading to the base station through the drone as a repeater for more intensive computing. With the goal of minimizing the energy consumption of the network system, the joint optimization problem of UAV trajectory, task association, computing and transmission resource allocation is determined. In view of the high complexity of the problem, this application decomposes the optimization problem into three sub-problems, which greatly reduces the energy consumption of the overall network system, prolongs the life of the network, and also reduces the calculation delay of all mobile devices in the communication network, improving Quality of service for computing-intensive applications.

Please refer to FIG. 19 , which is a schematic block diagram of an electronic device 100 provided by an embodiment of the present application. The electronic device 100 in this embodiment may be a server capable of data interaction and processing, a processing device, a processing platform, and the like. The electronic device 100 includes a first memory 110 , a first processor 120 and a communication module 130 . The components of the first memory 110 , the first processor 120 and the communication module 130 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.

Wherein, the first memory 110 is used to store programs or data. The first memory 110 can be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), can Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.

The first processor 120 is used for reading/writing data or programs stored in the first memory 110 and performing corresponding functions. The communication module 130 is used to establish a communication connection between the electronic device 100 and other communication terminals through the network, and is used to send and receive data through the network.

It should be understood that the structure shown in FIG. 19 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than those shown in FIG. 19 , or have a configuration different from that shown in FIG. 19 . Each component shown in FIG. 19 may be implemented using hardware, software, or a combination thereof.

Referring to FIG. 20 , the embodiment of the present application further provides a task offloading device 400 , and the functions implemented by the task offloading device 400 correspond to the steps performed by the above task offloading method. The task offloading apparatus 400 can be understood as a processor of the above-mentioned electronic device 100 , and can also be understood as a component independent of the above-mentioned electronic device 100 or the processor that implements the functions of the present application under the control of the electronic device 100 . Wherein, the task offloading apparatus 400 may include a task acquiring module 410 , an offloading policy acquiring module 420 and an offloading policy sending module 430 .

The task acquisition module 410 may be configured to acquire at least one pending task of the first device 210, wherein the pending task includes a target task. In this embodiment of the application, the task acquisition module 410 may be used to execute step S310 shown in FIG. 3 , and for relevant content of the task acquisition module 410 , please refer to the foregoing description of step S310 .

The offloading strategy acquisition module 420 may be configured to input the tasks to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model. In this embodiment of the present application, the uninstallation policy acquisition module 420 may be used to execute step S320 shown in FIG. 3 , and for related content of the uninstallation policy acquisition module 420 , please refer to the foregoing description of step S320 .

The offloading policy sending module 430 may be configured to send the task offloading policy to at least one first device 210, so that the at least one first device 210 offloads the target task to the second device 220 based on the task offloading policy, and the second device 220 Execute the target task. In this embodiment of the present application, the uninstallation policy sending module 430 may be used to execute step S330 shown in FIG. 3 , and for related content of the uninstallation policy sending module 430 , refer to the foregoing description of step S330 .

With reference to FIG. 21 , some other implementations of the embodiments of the present application further provide a scheduling optimization device 500 . It should be understood that the scheduling optimization apparatus described in some other implementation manners of the embodiments of the present application may be implemented as the task offloading apparatus described in some implementation manners of the present application. In addition, it can be understood that the functions implemented by the scheduling optimization apparatus 500 correspond to the steps performed by the above scheduling optimization method. In some other implementations according to the present application, the task acquisition module 510 may be configured to acquire pending tasks and current location information of at least one mobile device, wherein the pending tasks include the first task and the second task. In the embodiment of the present application, the task acquisition module 510 may be used to execute step S410 shown in FIG. 10 , and for relevant content of the task acquisition module 510 , please refer to the foregoing description of step S410 .

The scheduling strategy acquisition module 520 may be configured to input the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model. In the embodiment of the present application, the dispatching policy acquisition module 520 may be used to execute step S420 shown in FIG. 10 , and for relevant content of the dispatching policy acquisition module 520 , refer to the foregoing description of step S420 .

The scheduling strategy sending module 530 may be configured to send the scheduling strategy to at least one mobile device, so that the at least one mobile device sends the first task to at least one drone for processing based on the scheduling strategy, and sends the second task to at least one UAV for processing. A drone forwards to at least one base station for processing. In the embodiment of the present application, the dispatching policy sending module 530 may be used to execute step S430 shown in FIG. 10 , and for relevant content of the dispatching policy sending module 530 , refer to the foregoing description of step S430 .

In addition, an embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the above-mentioned task offloading method and/or the above-mentioned scheduling optimization method are executed. .

The computer program product of the task offloading method provided in the embodiment of the present application includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the task offloading method in the above method embodiment and/or the above scheduling optimization For the steps of the method, reference may be made to the foregoing method embodiments for details, and details are not repeated here.

To sum up, the task offloading method and device, electronic device, and storage medium provided by some embodiments of the present application obtain a task offloading strategy by inputting tasks to be processed into a task offloading model, and send the task offloading strategy to the first device to Make the first device offload the target task to the second device for processing based on the task offloading strategy, realize the offloading of the target task to the server for processing, and avoid the tasks in the related art that are either all performed locally on the wireless user equipment, or all offloaded on the server. The problem of low efficiency of task offloading caused by remote execution on the server.

In the scheduling optimization method and device, electronic equipment, and storage medium provided by other embodiments of the present application, the scheduling strategy is obtained by inputting the pending tasks and current location information into the preset scheduling optimization model, and sending the scheduling strategy to at least one mobile device so that at least one mobile device sends the first task to at least one UAV for processing based on the scheduling strategy, and forwards the second task to at least one base station for processing through at least one UAV, realizing the scheduling of the first task to The processing is carried out on the UAV, and the second task is dispatched to the base station for processing, which avoids the efficiency of scheduling optimization caused by the related technologies that the tasks are either all executed locally on the mobile device, or all are dispatched to the UAV or the base station for remote execution. low problem.

The above are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Industrial Applicability

The application provides a task offloading method and device, electronic equipment and a storage medium, and relates to the technical field of task offloading. The task offloading method is applied to an electronic device, and the electronic device is connected in communication with a task offloading system. The task offloading system includes a second device and at least one first device. The task offloading method includes: firstly, obtaining a pending task of at least one first device; secondly , input the task to be processed into the preset task offloading model to obtain the task offloading strategy; then, send the task offloading strategy to at least one first device, so that at least one first device offloads the target task to the second device based on the task offloading strategy device, the second device executes the target task. Through the above method, the efficiency of task offloading can be improved. In addition, the embodiments of the present application also provide a scheduling optimization method and device, electronic equipment, and a storage medium.

In addition, it can be understood that the task offloading method, scheduling optimization method and device, electronic equipment and storage medium of the present application are reproducible and can be used in various industrial applications. For example, the task offloading method, scheduling optimization method and device, electronic device, and storage medium of the present application may be used in the technical field of task offloading and scheduling optimization.

Claims

A task offloading method, characterized in that the task offloading method is applied to an electronic device, and the electronic device communicates with a task offloading system, the task offloading system includes a second device and at least one first device, and the task Uninstallation methods include:

Acquiring pending tasks of the at least one first device, wherein the pending tasks include target tasks;

Inputting the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;

sending the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the second device based on the task offloading policy, and the second device is responsible for The target task performs execution processing.
The task offloading method according to claim 1, wherein the task offloading method further comprises the step of obtaining a task offloading model, which step includes:

Establishing a system model and optimizing a cost function according to the cost parameters of the task offloading system;

The system model is trained according to the optimized cost function to obtain a task offloading model.
The task offloading method according to claim 2, wherein the steps of establishing a system model and optimizing a cost function according to cost parameters of the task offloading system include:

building a system model based on cost parameters of said at least one first device and second device;

An optimization cost function is established based on the system model.
The task offloading method according to claim 3, wherein the task offloading model includes a first task offloading model and a second task offloading model, and the system model is trained according to the optimized cost function to obtain The steps of the task offloading model include:

performing segmentation processing on the optimized cost function to obtain a first optimized cost function and a second optimized cost function;

training the system model according to the first optimization cost function to obtain a first task offloading model;

The system model is trained according to the second optimization cost function to obtain a second task offloading model.
The method for task offloading according to claim 4, wherein the task offloading strategy includes a first task offloading strategy and a second task offloading strategy, and inputting the task to be processed into a preset task offloading model obtains Steps for a task offload strategy, including:

inputting the pending task into the first task offloading model to obtain a first task offloading strategy;

The task to be processed is input into the second task offloading model to obtain a second task offloading policy.
The task offloading method according to claim 4 or 5, wherein the step of training the system model according to the first optimization cost function to obtain a first task offloading model includes:

Establishing a deep reinforcement learning model based on the system model;

The deep reinforcement learning model is trained according to the first optimized cost function to obtain a first task offloading model.
The task offloading method according to any one of claims 4 to 6, wherein the step of training the system model according to the second optimized cost function to obtain a second task offloading model includes:

Establishing an alternating direction multiplier method model based on the system model;

The alternating direction multiplier method model is trained according to the second optimization cost function to obtain a second task offloading model.
A task offloading device, characterized in that the task offloading device is applied to an electronic device, and the electronic device is connected to a task offloading system in communication, the task offloading system includes a second device and at least one first device, and the task Unloading devices include:

A task acquisition module configured to acquire pending tasks of the at least one first device, wherein the pending tasks include target tasks;

An offloading strategy acquisition module configured to input the task to be processed into a preset task offloading model to obtain a task offloading strategy, wherein the task offloading model is obtained by training based on an established system model;

an offloading policy sending module, configured to send the task offloading policy to the at least one first device, so that the at least one first device offloads the target task to the A second device, where the second device executes the target task.
A scheduling optimization method, characterized in that the scheduling optimization method is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system, and the mobile edge computing network system includes at least one base station, unmanned aerial vehicles, and mobile equipment, the scheduling optimization method includes:

Obtain pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;

Inputting the to-be-processed tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;

sending the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one drone for processing based on the scheduling strategy, and the second The task is forwarded by the at least one drone to the at least one base station for processing.
The scheduling optimization method according to claim 9, characterized in that the scheduling optimization method is implemented by using the task offloading method according to any one of claims 1-7.
The scheduling optimization method according to claim 9 or 10, wherein the scheduling optimization method also includes the step of obtaining a scheduling optimization model, which step includes:

Establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system;

The initial model is trained according to the optimization objective function to obtain a scheduling optimization model.
The scheduling optimization method according to claim 11, wherein the step of establishing an initial model and optimizing an objective function according to the initial parameters of the mobile edge computing network system includes:

establishing an initial model based on initial parameters of the at least one base station, UAV, and mobile device;

An optimization objective function is established according to the initial model.
The scheduling optimization method according to claim 11 or 12, wherein the scheduling optimization model includes a UAV trajectory planning model, a computing task joint scheduling model and a resource allocation model, and the optimization objective function is used for all The above initial model is trained to obtain the steps of scheduling optimization model, including:

performing split processing on the optimization objective function to obtain a first optimization objective function, a second optimization objective function and a third optimization objective function;

The initial model is trained according to the first optimization objective function to obtain the UAV trajectory planning model, and the initial model is trained according to the second optimization objective function to obtain the computing task joint scheduling model , training the initial model according to the third optimization objective function to obtain the resource allocation model.
The scheduling optimization method according to claim 13, wherein the step of inputting the pending tasks and current location information into a preset scheduling optimization model to obtain a scheduling strategy includes:

inputting the current location information into the UAV trajectory planning model, and calculating predicted location information of the at least one mobile device;

Input the task to be processed and the predicted location information into the task joint scheduling model, and calculate the task scheduling decision variable of the at least one mobile device;

Input the pending tasks and task scheduling decision variables into the resource allocation model to calculate a scheduling strategy.
The scheduling optimization method according to claim 14, wherein the step of inputting the current location information into the UAV trajectory planning model and calculating the predicted location information of the at least one mobile device includes:

performing motion prediction processing according to the current location information to obtain the next location information of the at least one mobile device;

Perform clustering processing on the next location information of the at least one mobile device to obtain predicted location information.
The scheduling optimization method according to claim 14 or 15, characterized in that the task scheduling decision variable of the at least one mobile device is calculated by inputting the pending task and predicted location information into the task joint scheduling model steps, including:

performing task joint scheduling training processing according to the pending task and predicted location information, to obtain the decision-making action of the at least one mobile device;

The decision-making actions are integrated to obtain task scheduling decision variables.
A scheduling optimization device, characterized in that it is applied to electronic equipment, and the electronic equipment is connected in communication with a mobile edge computing network system. The mobile edge computing network system includes at least one base station, unmanned aerial vehicle and mobile equipment. The scheduling optimization Devices include:

A task acquisition module configured to: acquire pending tasks and current location information of the at least one mobile device, wherein the pending tasks include a first task and a second task;

The scheduling strategy acquisition module is configured to: input the task to be processed and the current location information into a preset scheduling optimization model to obtain a scheduling strategy, wherein the scheduling optimization model is obtained by training based on the established initial model;

A scheduling strategy sending module, configured to: send the scheduling strategy to the at least one mobile device, so that the at least one mobile device sends the first task to the at least one mobile device based on the scheduling strategy The UAV performs processing, and forwards the second task to the at least one base station through the at least one UAV for processing.
The scheduling optimization device according to claim 17, wherein the scheduling optimization device is implemented as the task offloading device according to claim 8.
An electronic device, characterized by comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the computer program described in any one of claims 1 to 7 is realized. The task offloading method described above and the scheduling optimization method according to any one of claims 9 to 16.
A storage medium, characterized in that the storage medium includes a computer program, and when the computer program runs, the electronic device where the storage medium is located is controlled to execute the task offloading method described in any one of claims 1 to 7 and according to the claims The scheduling optimization method described in any one of 9 to 16.