CN117009053A

CN117009053A - Task processing method of edge computing system and related equipment

Info

Publication number: CN117009053A
Application number: CN202310860159.XA
Authority: CN
Inventors: 彭鹏; 林伟伟; 吴文泰; 刘永恒
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-11-07

Abstract

The application discloses a task processing method of an edge computing system and related equipment, wherein the method comprises the following steps: establishing an edge computing simulation system matched with the edge computing system; training a multi-agent deep reinforcement learning model in an edge computing simulation system to obtain a task unloading scheduling algorithm, and acquiring a computing resource allocation algorithm based on a convex optimization mathematical model; transmitting a task unloading scheduling algorithm and a computing resource allocation algorithm to the terminal so that a target task of the terminal performs task scheduling according to the task unloading scheduling algorithm; and sending the computing resource allocation algorithm to the edge server, so that the terminal and/or the server processes the target according to the computing resource allocation algorithm, thereby effectively reducing the task execution time and the task energy consumption cost required by executing the target task, and simultaneously effectively improving the reliability in the transmission process of the target task, and further improving the service quality of the edge computing system.

Description

Task processing method of edge computing system and related equipment

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to a task processing method of an edge computing system and related equipment.

Background

In recent years, with the rapid development of mobile devices and internet of things, computing intensive services are significantly increased, and terminal devices with limited computing power and sensitive power consumption are increasingly difficult to completely rely on local processing of computing tasks. As one of the emerging technologies of the internet of things, mobile edge computing solves this problem by having the internet of things node offload its computing tasks to a server nearby with sufficient computing resources. Meanwhile, because the computing resources of the edge server are relatively limited, the requirement of unloading a large number of terminal equipment computing intensive tasks is difficult to meet, and a decision needs to be reasonably made on whether the tasks are unloaded or not, and the computing resources are reasonably distributed for the arrived tasks. Therefore, a set of proper edge computing system task unloading scheduling method and computing resource allocation method can effectively reduce the energy consumption and time delay required by computing tasks of the nodes of the Internet of things and the edge servers, and further improve user experience.

In the related art, most of task offloading scheduling methods for edge computing systems are focused on reducing time cost or reducing energy consumption, and there is still a problem that communication is easily interrupted due to channel instability, thereby affecting execution of edge computing tasks.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, embodiments of the present application provide a task processing method and related device for an edge computing system, which can reduce the problem that communication is easily interrupted due to channel instability while reducing time cost and energy consumption.

In a first aspect, an embodiment of the present application provides a task processing method for an edge computing system, where the edge computing system includes a main server, at least two edge servers, and at least one terminal; the terminal is used for initiating a target task to be processed, the terminal or the edge server is used for executing the received target task, the task processing method is applied to the main server, and the task processing method comprises the following steps:

establishing an edge computing simulation system matched with the edge computing system; the edge computing simulation system comprises a simulation terminal corresponding to the terminal and a simulation server corresponding to the edge server;

training a multi-agent deep reinforcement learning model in an edge computing simulation system to obtain a task unloading scheduling algorithm so as to enable task execution time, task energy consumption cost and task transmission reliability required by the simulation terminal and/or the simulation server to achieve a first optimization target; the task unloading scheduling algorithm is used for adjusting an execution allocation strategy and a transmitting power strategy adopted by the terminal when the terminal executes a target task;

In the edge computing simulation system, based on a convex optimization mathematical model, a computing resource allocation algorithm is obtained so that the task execution time and the task energy consumption cost reach a second optimization target; the computing resource allocation algorithm is used for adjusting the computing resource allocation adopted by the terminal and/or the edge server when the target task is executed;

transmitting the task unloading scheduling algorithm and the computing resource allocation algorithm to the terminal so that a target task of the terminal is processed according to the task unloading scheduling algorithm and the computing resource allocation algorithm;

and sending the computing resource allocation algorithm to the edge server so that the edge server processes the received target task according to the computing resource allocation algorithm.

According to some embodiments of the application, the establishing an edge computing simulation system that matches the edge computing system includes:

acquiring the transmission time delay of the simulation terminal to send a task to the simulation server;

acquiring a first calculation time delay required by the simulation terminal to execute a task;

acquiring a second calculation time delay required by the simulation server to execute a task;

Determining the task execution time according to the transmission time delay, the first calculation time delay and the second calculation time delay;

acquiring transmission energy consumption of the simulation terminal to send a task to the simulation server;

acquiring first calculation energy consumption required by the simulation terminal to execute a task;

acquiring second calculation energy consumption required by the simulation server to execute a task;

determining the task energy consumption cost according to the transmission energy consumption, the first calculation energy consumption and the second calculation energy consumption;

acquiring the reliable probability of task transmission;

determining the task transmission reliability according to the task transmission reliability probability;

constructing a simulation task optimization target according to the task execution time, the task energy consumption cost and the task transmission reliability; wherein the simulation task optimization objective includes the first optimization objective and the second optimization objective.

According to some embodiments of the application, the acquiring the transmission delay of the simulation terminal sending task to the simulation server; acquiring a first calculation time delay required by the simulation terminal to execute a task; obtaining a second computation time delay required by the simulation server to execute a task, including:

Based on a non-orthogonal frequency division multiple access communication protocol, establishing a simulation channel according to a Rayleigh fading model and a free space path loss model; the simulation channel is used for describing wireless channel link information between the simulation terminal and the simulation server;

establishing task information of a simulation task based on a task random arrival model;

based on a shannon formula model, the simulation channel and the task information, constructing the transmission delay by taking the transmitting power as a variable;

acquiring terminal computing resources of the simulation terminal;

based on the terminal computing resources and the task information, constructing a first computing time delay required by the simulation terminal to execute the task by taking computing resource allocation of the simulation terminal as a variable;

acquiring server computing resources of the simulation server;

and based on the server computing resources and the task information, constructing a second computing time delay required by the simulation server to execute the task by taking the computing resource allocation of the simulation server as a variable.

According to some embodiments of the application, the acquiring the transmission energy consumption of the simulation terminal to send the task to the simulation server; acquiring first calculation energy consumption required by the simulation terminal to execute a task; obtaining second calculation energy consumption required by the simulation server to execute tasks, including:

Determining the transmission energy consumption based on the shannon formula model, the simulation channel and the task information;

determining first calculation energy consumption required by the simulation terminal to execute a task based on the terminal calculation resource and the task information;

and determining second computing energy consumption required by the simulation server to execute tasks based on the server computing resources and the task information.

According to some embodiments of the application, the acquiring the task transmission reliability probability includes:

based on the simulation channel and the shannon formula model, constructing the transmission rate of the simulation terminal for transmitting information to the simulation server by taking the transmission power of the simulation terminal as a variable;

and calculating the probability that the transmission rate reaches a preset condition as the task transmission reliability probability.

According to some embodiments of the application, training a multi-agent deep reinforcement learning model in an edge computing simulation system includes:

acquiring simulation state information of the current time slot of the simulation terminal;

acquiring an alternative action decision corresponding to the simulation state information;

determining a reward model according to the first optimization objective;

taking the simulation state information and the alternative action decision as training samples, and performing multi-round training on the multi-agent deep reinforcement learning model to obtain a trained multi-agent deep reinforcement learning model as the task unloading scheduling algorithm; the training process of the multi-agent deep reinforcement learning model comprises the following steps: the training samples are input into a multi-agent deep reinforcement learning model, and the multi-agent deep reinforcement learning model is adjusted by using a reward model.

According to some embodiments of the application, the multi-agent deep reinforcement learning model includes an offload decision network, a transmission power distribution network, a local Q network, and a global Q network, the inputting training samples into the multi-agent deep reinforcement learning model and adjusting the multi-agent deep reinforcement learning model with a reward model includes:

inputting the simulation state information into the unloading decision network to obtain a relaxation task unloading action;

expanding the relaxation task unloading action based on the task unloading action expansion algorithm to obtain an alternative unloading action group; the set of alternative unloading actions includes a plurality of alternative unloading actions;

inputting the simulation state information and the alternative unloading action group into the transmission power distribution network to obtain an alternative power distribution space; the alternative power allocation space includes a plurality of alternative power allocation groups corresponding to each of the alternative offloading actions; the alternative power allocation group includes a plurality of alternative power allocation actions that match corresponding to each alternative offloading action;

inputting the alternative power distribution space into the local Q network to obtain a local Q value, and taking an alternative power distribution action corresponding to the maximum local Q value as a local action decision;

Executing the local action decision based on the task information and the rewarding model to obtain local simulation rewards;

inputting the local Q value into the global Q network to obtain a global Q value;

based on the global Q value, a global action decision is obtained;

executing the global action decision based on the task information and the rewarding model to obtain global simulation rewards;

updating the offload decision network based on the relaxed task offload action and the local action decision;

updating the local Q network based on the local simulation rewards, the local action decisions and a local Q value for a next time slot;

updating the global Q network based on the global simulation rewards, the global action decisions and the global Q value of the next time slot;

updating the transmission power distribution network based on the global Q value;

the training of the multi-agent deep reinforcement learning model to obtain a task offloading scheduling algorithm includes:

and obtaining the task unloading scheduling algorithm according to the unloading decision network, the transmission power distribution network and the local Q network in the trained multi-agent deep reinforcement learning model.

According to some embodiments of the application, the global Q network includes a first processing module and a second processing module;

The step of inputting the local Q value into a global Q network to obtain a global Q value comprises the following steps:

inputting the local Q value into a first processing module, so that the first processing module takes the simulation state information as a first weight, and processing the local estimated value by combining the first weight to obtain a high-dimensional feature vector;

and inputting the high-dimensional feature vector into a second processing module, so that the second processing module takes the simulation state information as a second weight, and processing the high-dimensional feature vector by combining the second weight to obtain a global Q value.

According to some embodiments of the application, the updating the offload decision network based on the relaxed task offload action and the local action decision comprises:

taking the mean square error of the relaxation task unloading action and the local action decision as an unloading decision network error;

updating the offload decision network based on the gradient descent model and the offload decision network error.

According to some embodiments of the application, the updating the local Q network based on the local simulation rewards, the local action decisions and a local Q value of a next slot comprises:

Based on the local action decision, obtaining a local action decision Q value corresponding to the local action decision;

adding the local Q value of the next time slot with the local simulation rewards to be used as a local network difference value;

taking the mean square error of the local action decision Q value and the local network difference value as a local network error;

updating the local Q network based on a gradient descent model and the local network error.

According to some embodiments of the application, the updating the global Q network based on the global simulation rewards, the global action decisions and the global Q value of the next slot includes:

based on the global action decision, obtaining a global action decision Q value corresponding to the global action decision;

adding the global Q value of the next time slot with the global simulation rewards to be used as a global network difference value;

taking the mean square error of the global action decision Q value and the global network difference value as a global network error;

and updating the global Q network based on the gradient descent model and the global network error.

According to some embodiments of the application, the updating the transmission power distribution network based on the global Q value includes:

Taking the negative number of the global Q value as a transmission power distribution network error;

updating the transmission power distribution network based on the gradient descent model and the transmission power distribution network error.

According to some embodiments of the application, the expanding the relaxed task offload action based on the task offload action expansion algorithm to obtain an alternative offload action group includes:

processing the relaxation task unloading action based on the task unloading action expansion algorithm and the dichotomy to obtain a first alternative unloading action;

processing the relaxed task unloading action based on the task unloading action expansion algorithm, the dichotomy and the ascending sort to obtain a second alternative unloading action;

the first and second alternative unloading actions constitute the alternative unloading action group.

According to some embodiments of the application, in the edge computing simulation system, based on the convex optimization mathematical model, a computing resource allocation algorithm is obtained to enable the task execution time and the task energy consumption cost to reach a second optimization target, including:

constructing a computing resource optimization function according to the task execution time and the task energy consumption cost;

Taking the computing resource allocation as a computing resource optimization variable;

taking the allocable range of the computing resource allocation as a computing resource constraint;

constructing a computing resource optimization problem based on the computing resource optimization function, the computing resource optimization variable and the computing resource constraint;

and processing the computational resource optimization problem by using the convex optimization mathematical model to obtain the computational resource allocation algorithm.

In a second aspect, an embodiment of the present application provides a task offloading method of an edge computing system, applied to the edge server, where the task processing method includes:

receiving a computing resource allocation algorithm generated from the host server by executing a task processing method of an edge computing system as described in the first aspect;

transmitting current edge state information to the terminal;

receiving a target task from the terminal;

executing a computing resource allocation algorithm according to the target task to obtain an edge computing strategy;

and scheduling computing resources to process the target task based on the edge computing strategy.

In a third aspect, an embodiment of the present application provides a task offloading method of an edge computing system, applied to the terminal, where the task processing method includes:

Receiving the task offload scheduling algorithm and the computing resource allocation algorithm generated from the host server by executing a task processing method of an edge computing system as described in the first aspect;

acquiring current target task and wireless communication environment information;

receiving edge state information from the edge server;

executing the task unloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information to obtain a task scheduling strategy;

unloading the target task to at least one edge server for processing or processing the target task locally based on the task scheduling policy; and processing the target task according to the computing resource allocation algorithm while processing the target task locally.

According to some embodiments of the application, the executing the task offloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information, to obtain a task scheduling policy includes:

inputting an unloading decision network by taking the target task, the wireless communication environment information and the edge state information as parameters to obtain a target relaxation task unloading action;

Expanding the target relaxation task unloading action based on the task unloading action expansion algorithm to obtain a target alternative unloading action group;

inputting a transmission power distribution network by taking the target task, the wireless communication environment information, the edge state information and the target alternative unloading action group as parameters to obtain a target transmission power distribution space;

and inputting the target transmission power distribution space into a local Q network to obtain a task scheduling strategy.

In a fourth aspect, an embodiment of the present application provides a main server, which is applied to an edge computing system, where the edge computing system includes the main server, at least two edge servers, and at least one terminal, and the main server includes:

the simulation module is used for establishing an edge computing simulation system matched with the edge computing system; the edge computing simulation system comprises a simulation terminal corresponding to the terminal and a simulation server corresponding to the edge server;

the training module is used for training the multi-agent deep reinforcement learning model in the edge computing simulation system to obtain a task unloading scheduling algorithm so as to enable the task execution time, the task energy consumption cost and the task transmission reliability required by the simulation terminal and/or the simulation server to execute the task to reach a first optimization target;

The optimization module is used for acquiring a computing resource allocation algorithm based on a convex optimization mathematical model in the edge computing simulation system so as to enable the task execution time and the task energy consumption cost to reach a second optimization target;

the first sending module is used for sending the task unloading scheduling algorithm and the computing resource allocation algorithm to the terminal so that a target task of the terminal is processed according to the task unloading scheduling algorithm and the computing resource allocation algorithm;

and the second sending module is used for sending the computing resource allocation algorithm to the edge server so that the edge server processes the received target task according to the computing resource allocation algorithm.

In a fifth aspect, an embodiment of the present application provides an edge server applied to the edge computing system according to the fourth aspect, where the edge server includes:

a first edge receiving module, configured to receive a computing resource allocation algorithm generated by executing a task processing method of an edge computing system according to the first aspect from the main server;

the edge sending module is used for sending current edge state information to the terminal;

The second edge receiving module is used for receiving a target task from the terminal;

the edge execution module is used for executing a computing resource allocation algorithm according to the target task to obtain an edge computing strategy;

and the edge processing module is used for scheduling computing resources to process the target task based on the edge computing strategy.

In a sixth aspect, an embodiment of the present application provides a terminal applied to the edge computing system according to the fourth aspect, where the terminal includes:

a first terminal receiving module, configured to receive a task offloading scheduling algorithm and a computing resource allocation algorithm generated by executing a task processing method of an edge computing system according to the first aspect from the main server;

the terminal acquisition module is used for acquiring the current target task and the wireless communication environment information;

the second terminal receiving module is used for receiving the edge state information from the edge server;

the terminal execution module is used for executing the task unloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information to obtain a task scheduling strategy;

the terminal processing module is used for unloading the target task to at least one edge server for processing or processing the target task locally based on the task scheduling strategy; and processing the target task according to the computing resource allocation algorithm while processing the target task locally.

In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program for:

executing a task processing method of the edge computing system according to the first aspect;

or alternatively, the first and second heat exchangers may be,

executing a task processing method of the edge computing system according to the second aspect;

or alternatively, the first and second heat exchangers may be,

a task processing method of an edge computing system according to a third aspect is performed.

In an eighth aspect, an embodiment of the present application provides an electronic device, including: a processor; and a memory for storing executable instructions of the processor;

wherein the processor is configured to execute via execution of the executable instructions:

or alternatively, the first and second heat exchangers may be,

When the task offloading method of the edge computing system according to the first aspect is executed, an edge computing simulation system matched with the edge computing system is built by a main server, then a task offloading scheduling algorithm of a multi-agent deep reinforcement learning model is trained in the edge computing simulation system, a computing resource allocation algorithm is acquired, the task offloading scheduling algorithm is sent to a terminal, so that a target task of the terminal is subjected to task scheduling according to the task offloading scheduling algorithm, and finally the computing resource allocation algorithm is sent to the terminal and/or the edge server, so that the terminal and/or the edge server can process the target task based on the computing resource allocation algorithm, and reliability in a target task transmission process is effectively improved while task execution time and task energy consumption cost required for executing the target task are effectively reduced, and service quality of the edge computing system is further improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

Fig. 1 is a schematic structural diagram of an edge computing system according to an embodiment of the present application.

Fig. 2 is a flowchart of a task processing method applied to a main server according to still another embodiment of the present application.

Fig. 3 is a flowchart of step S201 in fig. 2.

Fig. 4 is a flowchart of steps S301-S303 in fig. 3.

Fig. 5 is a flowchart of steps S305-S307 in fig. 3.

Fig. 6 is a flowchart of step S309 in fig. 3.

Fig. 7 is a flowchart of step S202 in fig. 2.

Fig. 8 is a flowchart of step S704 in fig. 7.

Fig. 9 is a flowchart of step S802 in fig. 8.

Fig. 10 is a flowchart of step S806 in fig. 8.

Fig. 11 is a flowchart of step S809 in fig. 8.

Fig. 12 is a flowchart of step S810 in fig. 8.

Fig. 13 is a flowchart of step S811 in fig. 8.

Fig. 14 is a flowchart of step S812 in fig. 8.

FIG. 15 is a block flow diagram of a training process for a multi-agent deep reinforcement learning model provided in accordance with yet another embodiment of the present application.

FIG. 16 is a converging simulation diagram of task average energy consumption during training of a multi-agent deep reinforcement learning model according to yet another embodiment of the present application.

FIG. 17 is a converging simulation diagram of the task average time delay in a training process of a multi-agent deep reinforcement learning model according to another embodiment of the present application.

FIG. 18 is a converged simulation graph of task average reliability during training of a multi-agent deep reinforcement learning model in accordance with yet another embodiment of the present application.

Fig. 19 is a flowchart of step S203 in fig. 2.

Fig. 20 is a flowchart of a task processing method applied to an edge server according to another embodiment of the present application.

Fig. 21 is a flowchart of a task processing method applied to a terminal according to still another embodiment of the present application.

Fig. 22 is a flowchart of step S2104 in fig. 21.

FIG. 23 is an interactive flow chart of a task processing method according to another embodiment of the present application

Fig. 24 is a block diagram of a main server according to still another embodiment of the present application.

Fig. 25 is a block diagram of an edge server according to another embodiment of the present application.

Fig. 26 is a block diagram of a terminal structure according to still another embodiment of the present application.

Fig. 27 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

First, several nouns involved in the present application are parsed:

edge Computing (Edge Computing) is a distributed Computing model that transfers Computing, storage, and network functions from a traditional centralized cloud data center to Edge devices or Edge nodes that are closer to the data source for processing. The purpose of edge calculation is to reduce data transmission delay and improve network efficiency and data security by carrying out data processing and analysis on edge equipment.

Deep reinforcement learning (Deep Reinforcement Learning) is a method that combines deep learning and reinforcement learning to solve complex decision problems with continuous states and action space. The core idea of deep reinforcement learning is to learn through the interaction of agents with the environment to maximize the jackpot.

Convex optimization (Convex Optimization) is a solution to the mathematical optimization problem, where both the objective function and the constraints are convex functions. In convex optimization, we need to minimize (or maximize) a convex objective function while meeting a set of convex constraints.

Non-orthogonal frequency division multiple access (NOMA) is a multi-user communication technique for enabling spectrum sharing among multiple users in a wireless communication system. In contrast to conventional orthogonal frequency division multiple access (Orthogonal Frequency Division Multiple Access, OFDMA), NOMA allows multiple users to transmit data on the same frequency band simultaneously without the need to divide the frequency band into mutually orthogonal subcarriers.

Rayleigh fading (Rayleigh fading) is a fading model of a wireless channel that describes the random fading phenomenon experienced by a wireless signal during transmission. Rayleigh fading is mainly caused by multipath propagation, in which a signal undergoes reflection, scattering and diffraction of multiple paths during transmission, resulting in random variation of amplitude and phase of the signal.

Free space path loss (Free Space Path Loss) is a phenomenon of signal strength decay of a wireless signal due to an increase in distance during free space propagation. It is an important signal propagation model in a wireless communication system for estimating the attenuation of a signal as it propagates in space.

Shannon's formulation, also known as Shannon's capacity formula, was developed by the foundation of the information theory, claude Shannon, in 1948, to calculate the maximum reliable transmission rate in noisy communication channels.

With the development of 5G wireless communication technology, non-orthogonal frequency division multiple access technology is widely used, and by means of serial interference cancellation, the channel resource utilization rate can be effectively improved, and the network density and capacity can be improved. In the communication process, mutual interference is formed between devices occupying the same channel, users with high channel quality are preferentially decoded, and other users with lower channel quality are used as interference signals, so that the signal to noise ratio is influenced, and the transmission speed is further influenced. Therefore, the wireless communication method needs to appropriately adjust transmission power based on channel quality, thereby reducing interference. In addition, since the channel quality is affected by rayleigh fading, a strong variation may occur, resulting in a transmission rate per unit bandwidth below a threshold value, thereby causing communication interruption.

Furthermore, redundancy processing can be implemented in an edge computing system using non-orthogonal frequency division multiple access techniques. Redundancy refers to the process of simultaneously offloading the same computing task of a terminal to multiple edge servers. But this avoids the waste of computing resources to some extent.

Based on the above, the embodiment of the application provides a task processing method of an edge computing system and related equipment, which establishes an edge computing simulation system matched with the edge computing system by a main server, trains a task unloading scheduling algorithm of a multi-agent deep reinforcement learning model in the edge computing simulation system and acquires a computing resource allocation algorithm, and then sends the task unloading scheduling algorithm to a terminal so that a target task of the terminal performs task scheduling according to the task unloading scheduling algorithm, and finally sends the computing resource allocation algorithm to the terminal and/or the edge server so that the terminal and/or the edge server can process the target task based on the computing resource allocation algorithm, thereby effectively improving the reliability in the transmission process of the target task and further improving the service quality of the edge computing system while effectively reducing the task execution time and the task energy consumption cost required by executing the target task.

The embodiment of the application provides a task processing method and related equipment of an edge computing system, and specifically describes the following embodiment.

An edge computing system to which the task processing method according to the embodiment of the present application is applied is first described. Referring to fig. 1, a schematic structure of an edge computing system according to an embodiment of the present application is shown.

In one embodiment, the edge computing system includes a main server 110, at least two edge servers 120, and at least one terminal 130.

In some embodiments, the main server 110 may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

In some embodiments, the edge server 120 refers to a server device or system located at the edge of a network, and is used to provide edge computing, edge storage, and edge network services, etc.; the edge server may be a stand-alone server located around the terminal and capable of providing computing resources, or may be another terminal located near a terminal and capable of providing computing resources. Multiple edge servers may provide task collaboration processing for the same terminal.

In some embodiments, it is understood that at least one terminal 130 may be one terminal or a plurality of terminals, and may be four terminals as shown in fig. 1. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, or smart watch, or the like.

In some embodiments, the terminal 130 is used to initiate a target task to be processed. Meanwhile, the terminal 130 and the edge server 120 are used to perform the received target task. In addition, communication connection may be implemented between the main server 110, the edge server 120, and the terminal 130.

It is understood that a communication connection refers to establishing and maintaining a communication link between two or more devices in a computer network, including wired connections (e.g., fiber optic connections, serial port connections, etc.), wireless connections (e.g., wiFi, mobile networks, bluetooth, etc.), and virtual connections (e.g., virtual local area networks, virtual private networks, etc.).

It should be noted that, the structure of the edge computing system described in the embodiment of the present application is to more clearly describe the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided in the embodiment of the present application, and those skilled in the art can know that, with the evolution of the device architecture and the appearance of the new application scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

It will be appreciated by those skilled in the art that the edge computing system shown in FIG. 1 is not limiting of embodiments of the application and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Based on the above-mentioned edge computing system, a task processing method of the edge computing system according to an embodiment of the present application is further described below with reference to fig. 2 to 19, where the task processing method is applied to a main server in the edge computing system.

Referring to fig. 2, which is a flowchart illustrating a task processing method applied to a main server in an embodiment of the present application, the method in fig. 2 may include, but is not limited to, steps S201 to S205. It should be understood that the order of steps S201 to S205 in fig. 2 is not particularly limited, and the order of steps may be adjusted, or some steps may be reduced or added according to actual requirements.

Step S201: an edge computing simulation system is established that matches the edge computing system.

In some embodiments, in order to implement an edge computing system that describes a display in a mathematical form, so that resource scheduling in the edge computing system can be more appropriately scheduled from a mathematical perspective, and thus, the purpose of optimizing performance of the edge computing system is achieved, an edge computing simulation system that matches the edge computing system needs to be established. The edge computing simulation system is established and comprises a simulation terminal corresponding to the terminal and a simulation server corresponding to the edge server.

Furthermore, in order to describe the performance of an edge computing system from multiple angles, relevant performance parameters characterizing the performance of the edge computing system need to be built up in the edge computing simulation system accordingly.

Thus, in one embodiment, referring to fig. 3, constructing an edge computing simulation system that matches an edge computing system further includes the following steps S301 to S311:

step S301: and acquiring the transmission time delay of the simulation terminal to send the task to the simulation server.

Step S302: and acquiring a first calculation time delay required by the simulation terminal to execute the task.

Step S303: and obtaining a second calculation time delay required by the simulation server to execute the task.

In some embodiments, in order to describe parameters related to task execution time costs matched with a real edge computing system in the edge computing simulation system, the parameters need to be further refined to obtain a transmission delay of a simulation terminal to send a task to a simulation server, a first computation delay required by the simulation terminal to execute the task through a local computing manner, and a second computation delay required by the simulation server to execute the task.

In addition, in some embodiments, in order to more precisely describe the above transmission delay, the first computation delay, and the second computation delay, reference needs to be made to other parameters, including, but not limited to, a wireless channel environment during a transmission task.

In an embodiment, referring to fig. 4, the steps of obtaining the transmission delay of the simulation terminal for sending the task to the simulation server, obtaining the first computation delay required by the simulation terminal for executing the task, and obtaining the second computation delay required by the simulation server for executing the task further include the following steps S401 to S407:

Step S401: based on non-orthogonal frequency division multiple access communication protocol, a simulation channel is established according to a Rayleigh fading model and a free space path loss model.

In some embodiments, the emulated channel is used to describe wireless channel link information between the emulated terminal and the emulated server.

The edge computing simulation system comprises a plurality of simulation terminalsMultiple simulation serversAnd discretizes the continuous time into a plurality of time slots. The simulation terminal and the simulation server adopt a non-orthogonal frequency division multiple access technology for wireless communication. Establishing communication simulation terminals sharing channels to the same simulation server, wherein each channel width is +.>Use of small scale rayleigh fading>. Recording simulation terminal->And simulation server->Distance ofIs->The path loss is determined by adopting a free space path loss model:

wherein,is the path loss coefficient>Is the wavelength.

Wherein the emulated channel is faded by small-scale rayleighAnd path loss->Is composed of the components.

Step S402: and establishing task information of the simulation task based on the task random arrival model.

In some embodiments, the task information of the simulation task is established accordingly from an actual edge computing scenario, such as many computationally intensive internet of things applications (e.g., image/video/voice recognition, file scanning, data analysis, multi-sensor information processing, etc.).

At each time slotSimulation terminal->Arrival task information is->Task of (1), wherein->For task data packet size,/->The amount of computation required for the task. Task offloading decision is expressed as +.>Wherein，Representing a simulation terminal->Offloading its tasks to simulation server +.>，And the simulation terminal does not unload, and the simulation terminal calculates tasks locally. Considering that the simulation terminal can be processed locally, or selecting a simulation server set +.>The redundancy processing is performed on any of the plural ones, there is +.>。

It will be appreciated that although the redundancy process inevitably consumes some of the computing resources of the edge servers, in this embodiment, the application of the redundancy process may improve the reliability in the task transfer process. Therefore, in order to further reduce the loss of computing resources and improve the reliability in the task transmission process, it is necessary to perform efficient redundancy processing, i.e., to perform selective optimization of the offload task scheduling.

Step S403: based on shannon formula model, simulation channel and task information, using transmitting power as variable to construct transmission delay.

In some embodiments, in the edge computing simulation system, based on the constructed simulation channel and the task information, a transmission delay formula with the transmission power as a variable can be further derived through a shannon formula.

The transmission power allocation is recorded asWherein->Representing a simulation terminal->To simulation server->Allocated transmission power. Due to the non-orthogonal frequency division multiple access technology, the task content is decoded according to the channel gain sequence. At this time, let the simulation server>The transmission is carried out in comparison with the simulation terminal +.>Channel state of other emulated terminal with lower channel gain +.>Will form interference to the task transmission process>. According to shannon's formula, the terminal is simulated at this time>The maximum achievable transmission rate of (2) is:

wherein,representing gaussian white noise.

Step S404: and acquiring terminal computing resources of the simulation terminal.

In some embodiments, before the first computation delay required by the simulation terminal to execute the task is to be obtained, the terminal computation resource of the simulation terminal is also required to be obtained.

In one embodiment, the terminal will be emulatedTerminal computing resource numeralization of ∈>。

Step S405: based on the terminal computing resources and task information, computing resource allocation of the simulation terminal is used as a variable, and a first computing time delay required by the simulation terminal to execute the task is constructed.

In some embodiments, based on the terminal computing resources and the task information, a first computation delay with the computing resource allocation of the emulated terminal as a variable may be further derived.

Step S406: and obtaining server computing resources of the simulation server.

In some embodiments, the server computing resources of the simulation server are also acquired before the second computing latency required by the simulation server to execute the task is to be acquired.

In one embodiment, the server will be emulatedServer computing resource numeralization of +.>。

Step S407: and based on the calculation resources of the server and the task information, constructing a second calculation time delay required by the simulation server to execute the task by taking the calculation resource allocation of the simulation server as a variable.

In some embodiments, based on the server computing resources and the task information, a second computing latency with the computing resource allocation of the simulation server as a variable may be further derived.

In some embodiments, the computing resource allocation decision is expressed asWherein->Indicating device->(comprising a simulation server and a simulation terminal) is a simulation terminal +.>The computing resource proportion of the task allocation generated, and +.>. Since the simulation terminals are not mutually unloaded, the simulation terminals are not mutually unloaded. For each task, in the device->The calculation time delay required by the processing is as follows: />

The first calculation time delay and the second calculation time delay can be obtained through the calculation time delay formula.

Step S304: and determining task execution time according to the transmission time delay, the first calculation time delay and the second calculation time delay.

In some embodiments, based on the constructed transmission delay, the first computation delay, and the second computation delay, a task execution time required to execute a task in the edge computation simulation system may be further derived.

Thus, in the case of redundant processing, the task execution latency is defined as:

。

step S305: and acquiring the transmission energy consumption of the simulation terminal to send the task to the simulation server.

Step S306: and acquiring first calculation energy consumption required by the simulation terminal to execute the task.

Step S307: and obtaining second calculation energy consumption required by the simulation server to execute the task.

In some embodiments, in order to describe parameters related to task energy consumption costs matched with a real edge computing system in the edge computing simulation system, the parameters need to be further refined to obtain transmission energy consumption of a simulation terminal to send a task to a simulation server, first computing energy consumption required by the simulation terminal to execute the task through a local computing mode, and second computing energy consumption required by the simulation server to execute the task.

Furthermore, in some embodiments, in order to more precisely describe the above transmission energy consumption, the first calculation energy consumption, and the second calculation energy consumption, a corresponding mathematical modeling is also required.

In an embodiment, referring to fig. 5, the obtaining the transmission energy consumption of the task sent by the simulation terminal to the simulation server, the obtaining the first calculation energy consumption required by the simulation terminal to execute the task, and the obtaining the second calculation energy consumption required by the simulation server to execute the task further includes the following steps S501 to S503:

step S501: and determining the transmission energy consumption based on the shannon formula model, the simulation channel and the task information.

Step S502: and determining first calculation energy consumption required by the simulation terminal to execute the task based on the terminal calculation resources and the task information.

Step S503: based on the server computing resources and the task information, a second computing energy consumption required by the simulation server to perform the task is determined.

In some embodiments, based on the above-mentioned simulation channel and task information, referring to the shannon formula model, after deriving the transmission delay, the transmission energy consumption with the transmission power of the simulation terminal as a variable may be further derived as follows:

。

in some embodiments, based on the terminal computing resources and the task information, a first computing energy consumption may be further derived with the computing resource allocation of the emulated terminal as a variable. Likewise, based on the server computing resources and the task information, a second computing energy consumption may be further derived that is variable to simulate the computing resource allocation of the server.

The first calculation energy consumption and the second calculation energy consumption are uniformly expressed as follows:

wherein,is an energy efficiency performance parameter of the computing device.

Step S308: and determining task energy consumption cost according to the transmission energy consumption, the first calculation energy consumption and the second calculation energy consumption.

In some embodiments, the task energy consumption cost required to execute the task in the edge computing simulation system may be further derived based on the constructed transmission energy consumption, the first computing energy consumption, and the second computing energy consumption.

Thus, in the case of redundant processing, task energy consumption costs are defined as

。

Step S309: and acquiring the reliable probability of task transmission.

In some embodiments, since the channel quality is affected by rayleigh fading, strong variations may occur, resulting in a transmission rate per unit bandwidth below a threshold, which in turn causes communication interruption. Therefore, channel quality above a certain threshold is considered reliable for task transmission. In order to describe the task transmission reliably more precisely, a corresponding mathematical modeling is then required.

In an embodiment, referring to fig. 6, acquiring the task transmission reliability probability includes the following steps S601 to S602:

step S601: based on the simulation channel and the shannon formula model, the transmission rate of the simulation terminal for transmitting information to the simulation server is constructed by taking the transmission power of the simulation terminal as a variable.

Step S602: and calculating the probability that the transmission rate reaches a preset condition, and taking the probability as the reliable probability of task transmission.

In some embodiments, according to shannon's formula, for the firstSlot no->The transmission rate of each simulation terminal is

Wherein the method comprises the steps ofRepresenting gaussian white noise.

In non-OFDMA wireless communication technology, when the transmission rate per unit bandwidth is lower than a specific valueI.e.The transmission interruption is easy to occur, so that the task fails, and the transmission interruption of the task with higher decoding priority can simultaneously cause the transmission interruption and the task failure of the subsequent task. Rayleigh fading random variable at small scaleThe current task transmission reliability probability is:

wherein the method comprises the steps ofThe method comprises the following steps of:

，

。

step S310: and determining the reliability of task transmission according to the reliability probability of task transmission.

In some embodiments, based on the task transmission reliability probability, the slave simulation terminal may be further derivedTo simulation server->The reliable probability of task transmission of the task is the probability that the transmission interrupt does not occur in the preamble and the current task, namely +.>Slot no->The task transmission reliability of each simulation terminal is as follows

。

Step S311: constructing a simulation task optimization target according to task execution time, task energy consumption cost and task transmission reliability;

In some embodiments, based on the constructed mathematical modeling of task execution time, task energy consumption cost, and task transmission reliability, a simulated task optimization modeling may be further constructed as follows:

wherein,for maximum transmission power consumption +.>And controlling each weight for the super parameter.

The optimization objective is to maximize the task transmission reliability while minimizing the task execution time delay and the task energy consumption cost required when executing the task in the edge computing simulation system; the optimization variables are task unloading scheduling selection, transmission power selection of the simulation terminal and calculation resource allocation of the simulation terminal and the simulation server; the constraint is a selectable range of task offload scheduling, a task offload scheduling target indicator, a selectable range of transmission power of the simulation terminal, selectable ranges of all computing resource allocation of the simulation terminal and the simulation server, and selectable ranges of computing resource allocation of the simulation terminal and the simulation server.

In addition, the simulation task optimization targets comprise a first optimization target and a second optimization target which are needed to be achieved when a simulation terminal and a simulation server process the tasks in the edge computing simulation system.

In some embodiments, the simulation task constructed as described above is difficult to directly solve due to the complex optimization modeling calculation. Therefore, in the form of hierarchical optimization, task unloading scheduling selection and transmitting power selection of the simulation terminals in the optimization variables are solved through a deep reinforcement learning algorithm framework, and then computing resource allocation of the simulation terminals and the simulation servers in the optimization variables is solved through a convex optimization mathematical model.

In some embodiments, based on the characteristics of deep reinforcement learning, the task offloading scheduling selection in the optimization variables and the transmission power selection of the simulation terminal are solved by a deep reinforcement learning algorithm framework, and one model is required to be selected and trained for multiple times.

Step S202: in the edge computing simulation system, training a multi-agent deep reinforcement learning model to obtain a task unloading scheduling algorithm so as to enable task execution time, task energy consumption cost and task transmission reliability required by a simulation terminal and/or a simulation server to achieve a first optimization target.

In some embodiments, a multi-agent deep reinforcement learning model is adopted, and the task execution time, the task energy consumption cost and the task transmission reliability required by the execution of the task by the simulation terminal and/or the simulation server are targeted to reach a first optimization target, and the multi-agent deep reinforcement learning model is trained for multiple times in the edge computing simulation computing system, so that a task unloading scheduling algorithm meeting the first optimization target is obtained.

In some embodiments, the first optimization objective is an optimization objective preset based on task execution time, task energy consumption cost, and task transmission reliability required by the simulation terminal and/or the simulation server to execute the task. The first optimization objective may be to maximize task execution time and task energy consumption cost required for the simulation terminal and/or the simulation server to execute the task while minimizing task transmission reliability by adjusting task offloading scheduling selection and transmission power selection of the simulation terminal.

In some embodiments, it is desirable to construct a multi-agent deep reinforcement learning model, including:

each simulation terminal is taken as an agent, and it can be understood that a plurality of simulation terminals are a plurality of agents.

Building a state space: defining the state of the multi-agent deep reinforcement learning algorithm model +.>Comprising: the method comprises the steps of calculating amount required by a task, the size of a task data packet, terminal computing resources, server computing resources, the current simulation channel state, unloading decision of a task in the last time slot, unloading transmission power of the task in the last time slot and interference condition of a channel in the last time slot. Wherein (1)>Denoted as +.>In time slot->And the relevant state information of the simulation terminals.

Construction of an action space: for the arriving task, a target unloading position +.>And offload transmit power->. Wherein the target unloading position->Denoted as +.>In time slot->The execution allocation adopted by the execution tasks of the simulation terminals; offloading transmission power->Denoted as +.>In time slot->And the simulation terminals execute the transmission power adopted by the tasks.

Defining rewards: since the target framework of deep reinforcement learning is the maximized rewards, the maximized rewards need to be defined accordingly as: / >. Wherein (1)>Indicate->Cumulative rewards of the individual simulation terminals. Wherein it is understood that the jackpot refers to the +.>The personal simulation terminal is->The awards accumulated in the time slots.

Building a neural network: the following networks are deployed at each agent: offloading decision networksFor outputting an unloading decision of the current agent according to the current state, wherein +.>Is a parameter; transmission power distribution network->For outputting a transmission power decision of the current agent based on the current state and the offloading decision, wherein +.>Is a parameter; local Q networkFor estimating a local Q value based on the current state, an offloading decision and a transmission power decision, wherein +.>Is a parameter. In addition, in the edge computing simulation system, there is a global Q network +.>For estimating a global Q value based on the current state, the offloading decision and the transmission power decision, wherein ∈>Is a parameter. The network adopts at least two layers of fully-connected neural networks, wherein the hidden layer is activated by adopting a Relu function, and the output layer is scaled to the degree of +.>Interval. In addition, respectively constructing a target network with the same structure and the same initial parameters, namely an unloading decision target network, a transmission power distribution target network, a local target network and a global target network, wherein the corresponding parameters are +. >。

It will be appreciated that the target network and the corresponding Q network (including the local Q network, the global Q network, the offloading decision network, and the transmission power allocation network) possess the same structure and initial parameters, but act differently. The Q network is mainly used for estimating the Q value of each action in the current state, guiding the intelligent agent to select the optimal action in the current state, and the target network is mainly used for providing a fixed target Q value for calculating a training target value, reducing the fluctuation of the target value in the training process and improving the stability of an algorithm.

Wherein, the Relu function is called a modified linear unit, and is a commonly used activation function applied to a neural network; similarly, the Sigmoid function is also a commonly used activation function applied to neural networks.

Building an experience pool: the experience pool is a limited memory space of a first-in first-out structure. For each interactive action, recording all intelligent agent states of the systemSelecting actions (+)>Obtain rewards->Enter state after action execution->I.e.Saving to experience pool.

In some embodiments, referring to fig. 7, in an edge computing simulation system, training a multi-agent deep reinforcement learning model to obtain a task offloading scheduling algorithm includes the following steps S701 to S704:

Step S701: and acquiring simulation state information of the current time slot of the simulation terminal.

In some embodiments, the acquiring simulation state information of the current time slot of the simulation terminal is。/>

Step S702: and obtaining an alternative action decision corresponding to the simulation state information.

In some embodiments, based on current simulation state informationA corresponding selectable alternative action decision is generated. Wherein the alternative action decision comprises a target unloading position +.>And offload transmit power->。

Step S703: determining a reward model according to the first optimization objective.

In one embodiment, the real-time rewards are defined as based on a first optimization objectiveRefers to->Rewards earned by time slots.

Step S704: and taking the simulation state information and the alternative action decision as training samples, and performing multi-round training on the multi-agent deep reinforcement learning model to obtain a trained multi-agent deep reinforcement learning model as a task unloading scheduling algorithm.

In some embodiments, the simulation state information for each agent will vary for different time slots due to task arrival and the time variability of the wireless channel; moreover, alternative action decisions corresponding to different simulation state information are also different. Therefore, different simulation state information and alternative action decisions need to be used as training samples, and multiple rounds of training are performed on the multi-agent deep reinforcement learning model, so that the selected target unloading position and the unloading transmission power can reach a first optimization target aiming at different simulation state information. And taking the trained multi-agent deep reinforcement learning model as a task unloading scheduling algorithm.

In order to train the multi-agent deep reinforcement learning model more accurately, further refinement of the training step is required.

In some embodiments, referring to fig. 8, a training sample is input into the multi-agent deep reinforcement learning model and the multi-agent deep reinforcement learning model is adjusted using the reward model, comprising the following steps S801 to S813:

step S801: and inputting the simulation state information into an unloading decision network to obtain a relaxation task unloading action.

In some embodiments, simulation state information is input as an input parameterEntering an offloading decision network to cause the offloading decision network to match out selectable relaxation task offloading actions corresponding to the simulation state information according to the simulation state informationWherein->。

Step S802: and expanding the relaxation task unloading action based on a task unloading action expanding algorithm to obtain an alternative unloading action group.

In some embodiments, based on redundant processing, task offloading actions may select multiple simulation servers to combine, which may result in larger computing resources if all action combinations are traversed. Based on the above, the task unloading action expansion algorithm is adopted to expand the relaxation task unloading action, so that the action combination number to be calculated can be effectively reduced. Thus, in this embodiment, the current relaxed task offload action is extended to by employing a task offload action extension algorithm Individual alternative uninstall actions->。

In an embodiment, referring to fig. 9, the task offloading action is extended based on a task offloading action extension algorithm to obtain an alternative offloading action group, including the following steps S901 to S903:

step S901: and processing the relaxation task unloading action based on the task unloading action expansion algorithm and the dichotomy to obtain a first alternative unloading action.

Step S902: and processing the relaxed task unloading action based on the task unloading action expansion algorithm, the dichotomy and the ascending order to obtain a second alternative unloading action.

Step S903: the first alternative unloading action and the second alternative unloading action form an alternative unloading action group.

In some embodiments, for a given state, the task offload estimation network generates a relaxation action decisionWherein->. Task unloading action expansion algorithm is used for processing relaxation action decision, and expansion is obtained corresponding to the +.>Slot no->Personal simulation terminal->Individual offload action decisionsWherein->,，。

For the firstAnd (3) a step of deciding on the unloading actions, wherein based on the dichotomy idea, a first alternative unloading action is obtained by adopting the following formula:

。

for the firstTo->Action, first, the relaxation action decision is made as +. >Ascending order to obtain sequenceBased on the dichotomy idea, the second to the +.sup.th are obtained by the following formula>The following alternative offloading actions:

。

then, the first alternative unloading action and the second to the first are combinedAnd the alternative unloading actions form an alternative unloading action group.

Step S803: and inputting the simulation state information and the alternative unloading action group into a transmission power distribution network to obtain an alternative power distribution space.

In some embodiments, the simulation state information and the set of alternative offloading actions are used as input parameters to a parametric power allocation estimation network to obtain an alternative power allocation space comprised of a plurality of alternative power allocation sets corresponding to each alternative offloading action. Wherein the alternative power allocation group is made up of a plurality of available alternative offload actions corresponding to each alternative offload action.

In some embodiments, the state information is simulated andindividual alternative uninstall actions->Input transmission power distribution network, corresponding to generation +.>Transmission power allocation action->WhereinComposition->Individual action group->Further constituting available alternative unloading action spaces.

Step S804: and inputting the alternative power distribution space into a local Q network to obtain a local Q value, and taking an alternative power distribution action corresponding to the maximum local Q value as a local action decision.

In one embodiment, a local Q network is employed to estimate a local Q value for each group of actions, and one of the group of actions having the largest local Q value is selectedAs local action decision and adding normal distributed noise to the transmission power allocation, i.e. +.>。

Step S805: and executing a local action decision based on the task information and the rewarding model to obtain a local simulation rewarding.

In one embodiment, based on the learned task information and rewarding model, in the edge computing simulation system, the simulation terminal is made to execute local action decision to calculate transmission delayParameter energy consumption->Reliability of task transfer->Further, a corresponding local simulation reward may be obtained.

Step S806: and inputting the local Q value into a global Q network to obtain the global Q value.

In an embodiment, the optimization of the whole edge computing simulation system not only considers the performance obtained by a single simulation terminal, but also comprehensively considers the overall performance optimization of all simulation terminals, so that the global Q value needs to be further deduced by collecting the local Q values of all simulation terminals. The global Q value is used for representing the overall performance of all simulation terminals in the whole edge computing simulation system.

In an embodiment, referring to fig. 10, the local Q value is input into the global Q network to obtain the global Q value, including the following steps S1001 to S1002:

Step S1001: and inputting the local Q value into a first processing module, so that the first processing module takes the simulation state information as a first weight, and processing the local estimated value by combining the first weight to obtain a high-dimensional feature vector.

Step S1002: and inputting the high-dimensional feature vector into a second processing module, so that the second processing module takes the simulation state information as a second weight, and processing the high-dimensional feature vector by combining the second weight to obtain a global Q value.

In some embodiments, the global Q network structure includes two processing modules: and for the first processing module, the first full-connection layer is adopted to process the simulation state information of the intelligent agent into a first weight, and the second full-connection layer is adopted to process the local Q value of the intelligent agent into a high-dimensional feature vector by utilizing the first weight. And for the second processing module, processing the intelligent agent simulation state information into a second weight by adopting a third connecting layer, and mapping the high-dimensional feature vector into a global Q value by adopting a fourth full-connecting layer by utilizing the second weight.

Step S807: and obtaining a global action decision based on the global Q value.

In one embodiment, after the main server collects the local Q values of all the agents, a global Q value may be obtained, and a global action decision may be obtained according to the global Q value and the local action decisions of all the agents.

Step S808: and executing global action decision based on the task information and the rewarding model to obtain global simulation rewards.

In one embodiment, based on the acquired task information and rewarding model, in the edge computing simulation system, all agents in the whole system execute global action decision to calculate transmission delayParameter energy consumption->Reliability of task transfer->Further, a corresponding global simulation reward may be obtained.

Step S809: the offload decision network is updated based on the relaxed task offload actions and the local action decisions.

In one embodiment, referring to fig. 11, the update offload decision network based on the relaxed task offload actions and the local action decisions, includes the following steps S1101 to S1102:

step S1101: taking the mean square error of the relaxation task unloading action and the local action decision as an unloading decision network error.

Step S1102: and updating the unloading decision network based on the gradient descent model and the unloading decision network error.

In one embodiment, the updating of the offload decision network is accomplished by each agent based on the simulation state information of the current time slotCalculation of the +.th through the offloading decision network>Slot no->Individual simulated terminal relaxation task offloading actions . Reduction of pine by gradient descentMean square error of the relaxed task offloading action and the recorded actually taken optimal action +.>Further updating the parameters of the parameter offloading decision network +.>。

Step S810: the local Q network is updated based on the local simulation rewards, the local action decisions and the local Q value of the next slot.

In one embodiment, referring to fig. 12, updating the local Q network based on the local simulation rewards, the local action decisions and the local Q value of the next slot, includes the following steps S1201 to S1204:

step S1201: and obtaining a local action decision Q value corresponding to the local action decision based on the local action decision.

Step S1202: the local Q value of the next slot plus the local emulation prize is taken as the local network difference.

Step S1203: and taking the mean square error of the local action decision Q value and the local network difference value as the local network error.

Step S1204: the local Q network is updated based on the gradient descent model and the local network error.

In one embodiment, each agent is configured to determine the next time slot statusThen calculating the +.f through unloading the decision target network>Slot no->Relaxing task offloading action of individual simulation terminals>Then extend to +. >Slot no->Personal simulation terminal->The following alternative actions:the method comprises the steps of carrying out a first treatment on the surface of the Calculation of the th ∈through the transmission power allocation target network>Slot no->Transmission power allocation actions corresponding to the simulation terminals:And make up of->Slot no->Personal simulation terminal->Individual action group->. The local target network calculates a local Q value for each action group, and selects the local Q value as the local Q value of the next time slot:. At the same time, for the current time slot state and the recorded action group +.>Calculating a current local Q value by adopting a local Q network:. Finally, the TD error is reduced by gradients>Update parameter->。

It will be appreciated that TD error is a common concept in reinforcement learning to measure the difference between an estimated value and a target value. In reinforcement learning, TD errors are typically used to update a cost function or strategy.

Step S811: the global Q network is updated based on the global simulation rewards, the global action decisions and the global Q value of the next slot.

In an embodiment, referring to fig. 13, the global Q network is updated based on the global simulation rewards, the global action decisions and the global Q value of the next slot, comprising the following steps S1301 to S1304:

step S1301: and obtaining a global action decision Q value corresponding to the global action decision based on the global action decision.

Step S1302: and adding the global Q value of the next time slot with the global simulation reward as a global network difference value.

Step S1303: and taking the mean square error of the global action decision Q value and the global network difference value as the global network error.

Step S1304: and updating the global Q network based on the gradient descent model and the global network error.

In one embodiment, the primary server collects the next time slot status of all agentsAnd local Q value->Estimating the global Q value of the next time slot through a global target network:. Collecting the current time slot state of all agents simultaneously>And local Q value->Estimating a global Q value of a current time slot through a global Q network:. Gradient down TD error->Update parameter->。

Step S812: the transmission power distribution network is updated based on the global Q value.

In an embodiment, referring to fig. 14, updating a transmission power distribution network based on a global Q value includes the following steps S1401 to S1402:

step S1401: and taking the negative number of the global Q value as a transmission power distribution network error.

Step S1402: the transmission power distribution network is updated based on the gradient descent model and the transmission power distribution network error.

In one embodiment, for the current time slotStatus of individual simulation terminals- >And the unload action selected by the current record +.>Calculating transmission power allocation action by transmission power allocation network>And calculating the local Q value by a local Q network>. Collect all intelligenceAfter the energy local Q value, calculating the global Q value by a global Q network>. The transmission power distribution network update objective is to maximize +.>Decrease by gradient decrease +.>Updating parameters。

In one embodiment, referring to fig. 15, the training process of the multi-agent deep reinforcement learning model is:

1) Randomly initializing all neural network parameters, setting learning rate and accumulated rewarding attenuation coefficientNoise variance->Action expansion quantity->Soft update coefficient->The target iteration number is 3000, and the time slot length of a single iteration process is 100. Setting a reward value weight super parameter +.>；

2) Initializing the iteration number to be 1;

3) Setting the current time slot asInitializing an edge computing simulation system;

4) Acquiring required simulation state informationThe input offloading decision network gets a relaxed task offloading actionWherein->，；

5) The task unloading action is expanded to the current relaxation task unloading action by adopting a task unloading action expansion algorithmIndividual alternative uninstall actions->；

6) Current simulation state informationIndividual alternative uninstall actions- >Input transmission power distribution network, corresponding to generation +.>Transmission power allocation action->Wherein->Composition->Individual action group->；

7) Estimating a local Q value for each group of actions using a local Q network, selecting a group of actions having the largest local Q valueAnd adding normal distribution noise to the transmission power distribution>；

8) Interacting with an edge computing simulation system, simulating task unloading to each computing device, and computing transmission delayTransmission energy consumption->Reliability in task Transmission ∈>；

9) Invoking a computing resource allocation algorithm, simulating processing to complete a task, and calculating the execution time delay of the taskAnd task energy consumption costs->；

10 According to the statistical result, task execution time delayTask energy consumption cost->Reliability index->Obtaining the current rewarding value +.>. Updating environment, and counting simulation state information of next time slot +.>. Current simulation state information->Take actionPrize value->Simulation status information of next time slot +.>Saving to an experience pool;

11 Updating)Cycling through the above interactions until +.>The time slot length of a single iteration process is reached;

12 Sampling a batch of data from a self-experience pool；

13 Offloading decision network updates): each agent based on the current time slot status Obtaining a relaxation task unloading action by calculating an unloading decision network>. Reducing the mean square error of the relaxation task offloading action and the recorded actually taken optimal action by gradient descent +.>Updating parameters; />

15 Local Q network update): each agent is based on the next time slot stateObtaining a relaxation task unloading action through unloading decision target network calculationAnd expanded to +.>The following alternative actions:. The corresponding transmission power distribution action is obtained through calculation of the transmission power distribution target network:And make up->Individual action group->. Calculating a local Q value for each action group through a local target network, and selecting the maximum local Q value as the local Q value of the next time slot:. At the same time, for the current time slot state and the recorded action group +.>Calculating a current local Q value by adopting a local Q network:. Reducing TD error by gradientUpdate parameter->；

16 Global Q network update): the main server collects the simulation state information of the next time slot of all the agentsAnd the next time slotLocal Q value->Estimating the global Q value of the next time slot through a global target network:. Collecting simulation state information of the current time slot of all intelligent agents at the same time +. >And local Q value->Estimating a global Q value of a current time slot through a global Q network:. Gradient down TD error->Update parameter->；

17 Transmission power allocation network update): emulation state information for current time slotAnd the unload action selected by the current record +.>Calculating transmission power allocation action by transmission power allocation network>And calculating the local Q value by a local Q network>. After collecting the local Q values of all the agents, calculating a global Q value by a global Q network. Transmission power distribution networkThe goal of the network update is to maximize +.>Decrease by gradient decrease +.>Update parameter->；

18 Updating each target network parameter by soft update:

19 A new iteration number is added, the process returns to 3), and the updating process is repeated until the target iteration number is reached.

In an embodiment, wherein the average transmission energy consumptionAverage transmission delay->Average reliability in task transfer ∈>Variations of (a) are shown in fig. 16, 17 and 18, respectively. Wherein, the horizontal axes of fig. 16, 17 and 18 are the number of iterations of the multi-agent deep reinforcement learning model (i.e., the number of training of the multi-agent deep reinforcement learning model); drawing of the figure16 is the average transmission energy consumption change obtained when training the multi-agent deep reinforcement learning model in the edge calculation simulation system, fig. 17 is the average transmission delay change obtained when training the multi-agent deep reinforcement learning model in the edge calculation simulation system, and fig. 18 is the average reliability change in the task transmission obtained when training the multi-agent deep reinforcement learning model in the edge calculation simulation system. It is evident that when the number of iterations exceeds 1500, the average transmission energy consumption is +. >Average transmission delay->Average reliability in task transfer ∈>Tend to be a stable value. In other words, after training the multi-agent deep reinforcement learning model 1500 times, a trained multi-agent deep reinforcement learning model can be obtained preliminarily, and then a task unloading scheduling algorithm is obtained based on the trained multi-agent deep reinforcement learning model. It can be understood that when the computing resources are enough, the training times (i.e. the iteration times) of the multi-agent deep reinforcement learning model are increased, so that the performance of the obtained task unloading scheduling algorithm can be improved, further, in an actual edge computing system, the task execution time and task energy consumption cost required by the terminal in executing the target task are reduced, and the reliability in the task transmission process is improved.

Step S813: and obtaining a task unloading scheduling algorithm according to the unloading decision network, the transmission power distribution network and the local Q network in the trained multi-agent deep reinforcement learning model.

In an embodiment, after the training, in the trained multi-agent deep reinforcement learning model, the offloading decision network, the transmission power distribution target network and the local Q network are extracted as a task offloading scheduling algorithm, so that in an actual edge computing system, the terminal can execute the target task according to the offloading decision network, the transmission power distribution target network and the local Q network, thereby reducing task execution time and task energy consumption cost and improving reliability in a task transmission process.

Step S203: in the edge computing simulation system, a computing resource allocation algorithm is acquired based on a convex optimization mathematical model so that task execution time and task energy consumption cost reach a second optimization target.

In some embodiments, based on the constructed simulation task optimization modeling, for computing resource allocation of the simulation terminal and the simulation server in the optimization variables, a convex optimization mathematical model is adopted to solve the simulation terminal and the simulation server to obtain a computing resource allocation algorithm, so that task execution time and task energy consumption cost reach a second optimization target.

The second optimization target is an optimization target preset based on task execution time and task energy consumption cost required by the simulation terminal and/or the simulation server to execute the task. The second optimization objective may be to maximize the task execution time, task energy consumption cost required by the simulation terminal and/or the simulation server to execute the task by adjusting the computing resource allocation.

In an embodiment, referring to fig. 16, in the edge computing simulation system, based on the convex optimization mathematical model, a computing resource allocation algorithm is obtained to achieve the task execution time and task energy consumption cost to the second optimization target, which includes the following steps S1601 to S1605:

Step S1601: and constructing a computing resource optimization function according to the task execution time and the task energy consumption cost.

Step S1602: the computing resource allocation is used as a computing resource optimization variable.

Step S1603: the allocable range of the computing resource allocation is taken as the computing resource constraint.

Step S1604: and constructing a computing resource optimization problem based on the computing resource optimization function, the computing resource optimization variable and the computing resource constraint.

Step S1605: and processing the calculation resource optimization problem by using the convex optimization mathematical model to obtain a calculation resource allocation algorithm.

In some embodiments, the first is further constructed at task execution time, task energy cost, according to the mathematical description in the edge computing simulation system constructed as described aboveComputing resource optimization function for time slots

+，

And serves as an optimization target; to calculate resource allocationAs a computing resource optimization variable; taking the allocable range of the computing resource allocation as the computing resource constraint, the construction of the computing resource optimization problem is as follows: />

Wherein,refers to +.>The first computational delay and/or the second computational delay required by the time slot to perform the target task,refers to +.>The first and/or second computational energy consumption required by the time slot to perform the target task.

Since the computational resource optimization problem is a convex optimization problem, the computational resource optimization problem can be solved by a solution tool library such as CVXPY.

Step S204: and sending the task unloading scheduling algorithm and the computing resource allocation algorithm to the terminal so that the target task of the terminal is processed according to the task unloading scheduling algorithm and the computing resource allocation algorithm.

Step S205: and sending the computing resource allocation algorithm to the edge server so that the edge server processes the received target task according to the computing resource allocation algorithm.

In some embodiments, when executing the task offloading method applied to the main server as described above, an edge computing simulation system matched with the edge computing system is built by the main server, then a task offloading scheduling algorithm of a multi-agent deep reinforcement learning model is trained in the edge computing simulation system, a computing resource allocation algorithm is acquired, and then the task offloading scheduling algorithm is sent to the terminal, so that a target task of the terminal performs task scheduling according to the task offloading scheduling algorithm, and finally the computing resource allocation algorithm is sent to the terminal and/or the edge server, so that the terminal and/or the edge server can process the target task based on the computing resource allocation algorithm, thereby effectively improving reliability in a target task transmission process and further improving service quality of the edge computing system while effectively reducing task execution time and task energy consumption cost required by executing the target task.

Next, with reference to fig. 20, a task processing method of another edge computing system according to an embodiment of the present application is further described, where the task processing method is applied to an edge server in the edge computing system.

Referring to fig. 20, which is a flowchart of a task processing method applied to an edge server in an embodiment of the present application, the method in fig. 20 may include, but is not limited to, steps S2001 to S2005. Meanwhile, it should be understood that the order of steps S2001 to S2005 in fig. 20 is not particularly limited, and the order of steps may be adjusted, or some steps may be reduced or added according to actual requirements.

Step S2001: a computing resource allocation algorithm generated from the host server by executing the task processing method is received.

Step S2002: and sending the current edge state information to the terminal.

Step S2003: a target task is received from the terminal.

Step S2004: and executing a computing resource allocation algorithm according to the target task to obtain an edge computing strategy.

Step S2005: and scheduling the computing resource to process the target task based on the edge computing strategy.

In some embodiments, after the main server obtains the computing resource allocation algorithm through the task processing method, the computing resource allocation algorithm is sent to the edge server for deployment. The edge server then transmits edge state information containing information characterizing the computing resources of the edge server to the terminal, so that the terminal can select whether to transmit the target task to the edge server according to the edge state information. After receiving the target task from the terminal, the edge server can execute a computing resource allocation algorithm according to the target task, so that an edge computing strategy is obtained, computing resources are mobilized to process the target task based on the edge computing strategy, and further task execution time and task energy consumption cost required by executing the target task are reduced by the edge server.

A further task processing method of an edge computing system according to an embodiment of the present application is further described below with reference to fig. 21-22, where the task processing method is applied to a terminal in the edge computing system.

Referring to fig. 21, which is a flowchart of a task processing method applied to a terminal in an embodiment of the present application, the method of fig. 21 may include, but is not limited to, steps S2101 to S2105. Meanwhile, it should be understood that the order of steps S2101 to S2105 in fig. 21 is not particularly limited, and the order of steps may be adjusted, or some steps may be reduced or increased according to actual requirements.

Step S2101: a task offload scheduling algorithm and a computing resource allocation algorithm generated by executing a task processing method from a host server are received.

Step S2102: and acquiring the current target task and wireless communication environment information.

Step S2103: edge state information is received from an edge server.

Step S2104: and executing a task unloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information to obtain a task scheduling strategy.

Step S2105: unloading the target task to at least one edge server for processing or processing the target task locally based on a task scheduling policy; and when the target task is processed locally, processing the target task according to a computing resource allocation algorithm.

In some embodiments, after obtaining the task offloading scheduling algorithm and the computing resource allocation algorithm through the task processing method, the main server sends the task offloading scheduling algorithm and the computing resource allocation algorithm to the terminal for deployment. The terminal acquires the current target task, the wireless communication environment information and the edge state information from the edge server, so that the terminal can execute a task unloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information to obtain a task scheduling strategy representing the execution mode of the target task. Then according to the task scheduling strategy, the terminal uninstalls the target task to at least one edge server for processing or processes the target task locally; when the terminal processes the target task locally, selecting whether to send the target task to the edge server or not according to the calculation resource allocation algorithm; after receiving the target task from the terminal, the edge server can process the target task according to a computing resource allocation algorithm, so that the terminal can reduce task execution time and task energy consumption cost required by executing the target task when executing the target task, and meanwhile, the reliability in the task transmission process is improved.

In an embodiment, the generation mode and source of the target task are not limited, i.e. the target task may be generated automatically by collecting peripheral data of the terminal itself, or may be sent to the terminal by other devices.

In order to describe the process of using the task offload scheduling algorithm by the terminal more accurately, the process needs to be further refined.

In an embodiment, referring to fig. 22, a task offloading scheduling algorithm is performed according to a target task, wireless communication environment information, and edge state information, to obtain a task scheduling policy, including the following steps S2201 to S2204:

step S2201: and taking the target task, the wireless communication environment information and the edge state information as parameters, and inputting the parameters into an unloading decision network to obtain the unloading action of the target relaxation task.

Step S2202: and expanding the target relaxation task unloading action based on a task unloading action expansion algorithm to obtain a target alternative unloading action group.

Step S2203: and taking the target task, the wireless communication environment information, the edge state information and the target alternative unloading substitution group as parameters, and inputting the parameters into a transmission power distribution network to obtain a target transmission power distribution space.

Step S2204: and inputting the target transmission power distribution space into a local Q network to obtain a task scheduling strategy.

In one embodiment, the main server builds a task offloading scheduling algorithm based on a multi-agent deep reinforcement learning model trained in the edge computing simulation system by using an offloading decision network, a transmission power distribution network and a local Q network, and then sends the task offloading scheduling algorithm to the terminal, so that the terminal can execute a target task according to the task offloading scheduling algorithm in the edge computing system.

In one embodiment, for each task arrival slot, each terminal monitors and gathers the required information, invokes a task offload scheduling algorithm, decides offload scheduling locations and radio transmission power. According to the decision result, performing task calculation processing locally or establishing a wireless communication channel based on a non-orthogonal frequency division multiple access technology with an unloading target edge server, and then uploading a data packet of a target task by adopting a given transmission power. The specific steps of deciding to unload the scheduling position and the wireless transmission power include:

1) Acquiring required state informationThe input offloading decision network gets a relaxed task offloading actionWherein->，；

2) The task unloading action expansion algorithm is adopted to expand the current target relaxation task unloading action to Individual alternative uninstall actions->As a target alternative unloading action group;

3) Current state information and methodIndividual alternative uninstall actions->Input transmission power distribution network, corresponding to generation +.>Transmission power allocation action->Wherein->Composition->Individual action group->Forming a target transmission power allocation space;

4) Estimating a local Q value for each group of actions using a local Q network, selecting a group of actions having the largest local Q valueScheduling a position decision and a wireless transmission power decision for the current unloading;

5) And obtaining a task scheduling strategy according to the unloading scheduling position decision and the wireless transmission power decision.

In one embodiment, the host server gathers and stores relevant information for each execution of the target task in the edge computing system as a training sample for the edge computing simulation system.

In an embodiment, the main server updates the edge computing simulation system according to the edge computing system after a certain update preset time, so that the edge computing simulation system is more matched with the edge computing system, and the reliability of the method is improved. It will be appreciated that the present embodiment does not specifically limit the update preset time, that is, the update preset time may be a specific time length (such as one hour, one day, etc.), or the edge computing simulation system may be updated immediately according to the change of the edge computing system.

Fig. 23 is a flowchart of a task processing method for edge calculation according to an embodiment of the application. The specific steps are as follows.

1) The main server constructs an edge computing simulation system matched with the edge computing system;

2) The method comprises the steps that a main server obtains a task unloading scheduling algorithm and a computing resource allocation algorithm in a constructed edge computing simulation system;

3) The main server sends a task unloading scheduling algorithm and a computing resource allocation algorithm to the terminal;

4) The main server sends a computing resource allocation algorithm to the edge server;

5) The target task reaches the terminal;

6) The terminal acquires task information and wireless communication environment information;

7) The terminal acquires the edge state information of the edge server;

8) The terminal calls a task unloading scheduling algorithm according to the task information, the wireless communication environment information and the edge state information to obtain a task unloading scheduling strategy;

9) The terminal sends the target task to an edge server or calculates locally according to the task unloading scheduling strategy;

10 When the terminal calculates the target task locally, processing the target task according to a computing resource allocation algorithm to obtain a local processing result;

11 The edge server processes the received target task according to the computing resource allocation algorithm to obtain an edge processing result;

12 The edge server sends the edge processing result to the terminal;

13 The edge server sends the related data to the main server so as to be convenient for the main server to store;

14 The terminal sends the related data to the main server so as to be convenient for the main server to store;

15 The host server updates the edge computing simulation system according to the edge computing system every time a specific time elapses.

The embodiment of the present application further provides a main server, which may implement the task processing method applied to the main server, with reference to fig. 24, where the apparatus includes:

a simulation module 2410 for establishing an edge computing simulation system that matches the edge computing system; the edge computing simulation system comprises a simulation terminal corresponding to the terminal and a simulation server corresponding to the edge server;

the training module 2420 is configured to train the multi-agent deep reinforcement learning model in the edge computing simulation system to obtain a task unloading scheduling algorithm, so that task execution time, task energy consumption cost and task transmission reliability required by the simulation terminal and/or the simulation server to execute the task reach a first optimization target;

the optimization module 2430 is configured to obtain, in the edge computing simulation system, a computing resource allocation algorithm based on the convex optimization mathematical model, so that task execution time and task energy consumption cost reach a second optimization target;

A first sending module 2440, configured to send the task offload scheduling algorithm and the computing resource allocation algorithm to the terminal, so that the target task of the terminal is processed according to the task offload scheduling algorithm and the computing resource allocation algorithm;

a second sending module 2450 is configured to send the computing resource allocation algorithm to the edge server, so that the edge server processes the received target task according to the computing resource allocation algorithm.

The specific implementation of the main server device in this embodiment is substantially identical to the specific implementation of the task processing method applied to the main server, and will not be described herein.

The embodiment of the present application further provides an edge server, which may implement the task processing method applied to the edge server, and referring to fig. 25, the device includes:

a first edge receiving module 2510 for receiving a computing resource allocation algorithm generated from the host server by executing a task processing method of an edge computing system as in the first aspect;

an edge sending module 2520, configured to send current edge state information to a terminal;

a second edge receiving module 2530, configured to receive a target task from a terminal;

an edge execution module 2540, configured to execute a computing resource allocation algorithm according to the target task, to obtain an edge computing policy;

The edge processing module 2550 is configured to schedule the computing resource to process the target task based on the edge computing policy.

The specific implementation manner of the edge server device in this embodiment is substantially identical to the specific implementation manner of the task processing method applied to the edge server, and will not be described herein.

The embodiment of the application also provides a terminal, which can realize the task processing method applied to the terminal, and referring to fig. 26, the device comprises:

a first terminal receiving module 2610 for receiving a task offload scheduling algorithm and a computing resource allocation algorithm generated from a main server by executing a task processing method of the edge computing system as in the first aspect;

a terminal acquiring module 2620, configured to acquire current target task and wireless communication environment information;

a second terminal receiving module 2630 for receiving edge state information from an edge server;

the terminal execution module 2640 is configured to execute a task offloading scheduling algorithm according to the target task, the wireless communication environment information and the edge state information, so as to obtain a task scheduling policy;

the terminal processing module 2650 is configured to offload, based on the task scheduling policy, the target task to at least one edge server for processing or process the target task locally; and when the target task is processed locally, processing the target task according to a computing resource allocation algorithm.

The specific implementation manner of the terminal device in this embodiment is basically the same as the specific implementation manner of the task processing method applied to the terminal, and will not be described herein.

The embodiment of the application also provides a storage medium, which is a computer readable storage medium, and the storage medium stores a computer program which realizes the above-mentioned updating method of the sales terminal host and the above-mentioned updating method of the sales terminal host when being executed by a processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment of the application also provides electronic equipment, which comprises:

At least one memory;

at least one processor;

at least one program;

the program is stored in a memory and the processor executes the at least one program to implement the method of the application as described above. The electronic equipment can be any intelligent terminal including a mobile phone, a tablet personal computer, a personal digital assistant (Personal Digital Assistant, PDA for short), a vehicle-mounted computer and the like.

Referring to fig. 27, fig. 27 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

the processor 2701 may be implemented by a general purpose CPU (central processing unit), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the embodiments of the present application;

the memory 2702 may be implemented in the form of a ROM (read only memory), a static storage device, a dynamic storage device, a RAM (random access memory), or the like. Memory 2702 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented in software or firmware, relevant program codes are stored in memory 2702 and invoked by processor 2701 to perform the methods of the embodiments of the present disclosure;

An input/output interface 2703 for implementing information input and output;

the communication interface 2704 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

a bus 2705 for transferring information among the various components of the device (e.g., processor 2701, memory 2702, input/output interface 2703, and communication interface 2704);

wherein the processor 2701, the memory 2702, the input/output interface 2703 and the communication interface 2704 realize communication connection inside the device with each other through a bus 2705.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A task processing method of an edge computing system, wherein the edge computing system comprises a main server, at least two edge servers and at least one terminal; the terminal is used for initiating a target task to be processed, the terminal or the edge server is used for executing the received target task, the task processing method is applied to the main server, and the task processing method comprises the following steps:

2. The method for processing tasks of an edge computing system according to claim 1, wherein said establishing an edge computing simulation system matched with said edge computing system comprises:

acquiring the reliable probability of task transmission;

3. The method for processing tasks of an edge computing system according to claim 2, wherein said obtaining a transmission delay of a task transmitted from said simulation terminal to said simulation server; acquiring a first calculation time delay required by the simulation terminal to execute a task; obtaining a second computation time delay required by the simulation server to execute a task, including:

acquiring terminal computing resources of the simulation terminal;

acquiring server computing resources of the simulation server;

4. A method for processing tasks in an edge computing system according to claim 3, wherein said obtaining said simulation terminal transmits the transmission energy consumption of tasks to said simulation server; acquiring first calculation energy consumption required by the simulation terminal to execute a task; obtaining second calculation energy consumption required by the simulation server to execute tasks, including:

5. A method for processing tasks in an edge computing system according to claim 3 wherein said obtaining a task transmission reliability probability comprises:

6. The method for processing tasks of an edge computing system according to claim 1, wherein training a multi-agent deep reinforcement learning model in an edge computing simulation system comprises:

Determining a reward model according to the first optimization objective;

7. The method of claim 6, wherein the multi-agent deep reinforcement learning model includes an offload decision network, a transmission power distribution network, a local Q network, and a global Q network, wherein inputting training samples into the multi-agent deep reinforcement learning model and adjusting the multi-agent deep reinforcement learning model with a reward model comprises:

based on the global Q value, a global action decision is obtained;

8. The method of claim 7, wherein the global Q network includes a first processing module and a second processing module;

9. The method of task processing for an edge computing system of claim 7, wherein said updating said offload decision network based on said relaxed task offload action and said local action decision comprises:

10. The method of claim 7, wherein updating the local Q network based on the local simulation rewards, the local action decisions, and a local Q value of a next time slot comprises:

11. The method of claim 7, wherein updating the global Q network based on the global simulation rewards, the global action decisions and the global Q value of the next slot comprises:

12. The method for processing tasks in an edge computing system according to claim 7 wherein updating the transmission power distribution network based on the global Q value comprises:

13. The method for processing tasks in an edge computing system according to claim 7, wherein said expanding said relaxed task offload actions based on said task offload action expansion algorithm to obtain an alternative offload action group comprises:

14. The method for processing tasks in an edge computing system according to claim 2, wherein in the edge computing simulation system, based on a convex optimization mathematical model, a computing resource allocation algorithm is obtained to achieve a second optimization objective for the task execution time and the task energy consumption cost, comprising:

15. A task processing method of an edge computing system, applied to the edge server, the task processing method comprising:

receiving a computing resource allocation algorithm from the host server generated by performing a task processing method of an edge computing system as claimed in any one of claims 1 to 14;

transmitting current edge state information to the terminal;

receiving a target task from the terminal;

16. A task processing method of an edge computing system, applied to the terminal, the task processing method comprising:

receiving a task offload scheduling algorithm and a computing resource allocation algorithm from the host server generated by executing a task processing method of an edge computing system according to any one of claims 1-14;

receiving edge state information from the edge server;

17. The method for processing tasks of an edge computing system according to claim 16 wherein said executing said task offload scheduling algorithm based on said target task, said wireless communication environment information and said edge state information to obtain a task scheduling policy comprises:

18. A host server for use in an edge computing system, the edge computing system comprising the host server, at least two edge servers, and at least one terminal, the host server comprising:

19. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements a task processing method of an edge computing system according to any of claims 1-14,

or alternatively, the first and second heat exchangers may be,

a method of task processing for an edge computing system as recited in claim 15,

or alternatively, the first and second heat exchangers may be,

a method of task processing for an edge computing system as claimed in any one of claims 16 to 17.

20. An electronic device, comprising:

a processor; and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform a task processing method of an edge computing system as claimed in any one of claims 1-14 via execution of the executable instructions,

or alternatively, the first and second heat exchangers may be,