WO2021233053A1 - 计算卸载的方法和通信装置 - Google Patents

计算卸载的方法和通信装置 Download PDF

Info

Publication number
WO2021233053A1
WO2021233053A1 PCT/CN2021/088860 CN2021088860W WO2021233053A1 WO 2021233053 A1 WO2021233053 A1 WO 2021233053A1 CN 2021088860 W CN2021088860 W CN 2021088860W WO 2021233053 A1 WO2021233053 A1 WO 2021233053A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge node
terminal device
task
computing
cost
Prior art date
Application number
PCT/CN2021/088860
Other languages
English (en)
French (fr)
Inventor
刘志成
宋金铎
赵明宇
严学强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21807848.3A priority Critical patent/EP4142235A4/en
Publication of WO2021233053A1 publication Critical patent/WO2021233053A1/zh
Priority to US17/990,944 priority patent/US20230081937A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/08Protocols specially adapted for terminal emulation, e.g. Telnet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/101Server selection for load balancing based on network conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/59Providing operational support to end devices by off-loading in the network or by emulation, e.g. when they are unavailable
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of communications, and in particular to a method and communication device for computing offloading.
  • edge computing proposes to provide computing and storage resources at the edge of the network, which has the characteristics of lower latency, lower energy consumption, bandwidth saving, strong privacy, and more intelligence.
  • Cloud Cloud
  • fog computing fog computing
  • mobile edge computing mobile edge computing
  • multi-access edge computing multi-access edge computing
  • MEC multi-access edge computing
  • edge computing is at the intersection of computing mode and network communication, and it is necessary to meet a wide range of business needs and strive for excellence. User experience has an irreplaceable role.
  • edge computing refers to the terminal device offloading computationally intensive tasks to network edge nodes for computing, and accepting the running results of the corresponding tasks to achieve the purpose of computing offloading.
  • edge computing there are two specific considerations for edge computing: On the one hand, terminal devices have inadequacies such as low computing power, high energy sensitivity, and fewer support functions; on the other hand, network edge nodes also expose computing resources Dispersion, different power supply methods, and diverse system architectures. Therefore, how to better manage and allocate the computing power of edge nodes has become an important issue that requires urgent attention.
  • the present application provides a method and communication device for computing offloading, which can enable edge nodes and terminal devices to have a broader perception of the environment in actual decision-making, thereby effectively improving the decision-making benefits of both.
  • a method in a first aspect, includes: a first terminal device sends a first state of a first computing task to a first edge node, where the first edge node is an edge node from which the first terminal device obtains computing resources ,
  • the first state includes at least one of the length of the data stream for transmitting the first calculation task, the number of clock cycles required to calculate the first calculation task, and the penalty value of the first calculation task;
  • a terminal device receives a second offloading decision sent by the first edge node.
  • the second offloading decision is determined by the first state.
  • the second offloading decision includes one or more computing resource allocation information of the second terminal device.
  • the second terminal device It is a terminal device that obtains computing resources from the first edge node, and the first terminal device is one of one or more second terminal devices; the first terminal device determines the first uninstallation of the first computing task according to the second uninstallation decision Decision, the first offloading decision is used to instruct the first terminal device whether to offload the first computing task to the first edge node for calculation.
  • the terminal device sends a computing task to the edge node, and the edge node determines the allocation of computing resources according to the status of one or more received computing tasks, and then the edge node sends the allocation of computing resources to the terminal device it serves.
  • the terminal device determines whether to offload the computing task to the edge node for calculation, which can enable the terminal device to have a broader perception of the environment in actual decision-making, thereby effectively improving the decision-making benefits of both.
  • the first terminal device when the first offloading decision instructs the first terminal device to offload the first computing task to the first edge node for computing, the first terminal device sends The first computing task; the first terminal device receives the calculation result of the first computing task sent by the first edge node; or when the first offloading decision instructs the first terminal device not to uninstall the first computing task, the first terminal device is The calculation result of the first calculation task is determined locally.
  • the first terminal device determines the first offloading decision of the first computing task according to the second offloading decision, including: the first terminal device updates the first offloading decision according to the second offloading decision
  • the parameters in the first state of a calculation task obtain the second state of the first calculation task; the first terminal device calculates the cost value of the first calculation task according to the second state, and the cost value of the first calculation task includes the cost value of the first calculation task.
  • Local overhead and offloading overhead; the first terminal device determines the first offloading decision of the first computing task according to the cost value of the first computing task.
  • the terminal device determines whether to offload the computing task to the edge node for calculation according to the resource allocation of the edge node, which enables the terminal device to have a broader perception of the environment in actual decision-making, thereby effectively improving both Decision-making gains.
  • the first terminal device calculates the cost value of the first computing task according to the second state, which includes: the first terminal device uses the multi-agent deep enhancement according to the second state Learn the first cost function in the MADRL algorithm to determine the cost value of the first computing task.
  • the first cost function includes an offload cost function and a local calculation cost function.
  • the offload cost function is used to determine the offload cost of the first calculation task.
  • the cost function is used to determine the local cost of the first calculation task; and the first terminal device determines the first offloading decision of the first calculation task according to the cost value of the first calculation task, including: the first terminal device iteratively updates the first calculation task according to the MADRL algorithm The state of the first calculation task of the terminal device and the cost value of the first calculation task; when the MADRL algorithm reaches the termination condition, the first terminal device determines the first offloading decision of the first calculation task according to the minimum cost value of the first calculation task.
  • the terminal device uses the MADRL algorithm to determine whether to offload the computing task to the edge node for calculation according to the resource allocation of the edge node, which enables the terminal device to have a wider perception of the environment in actual decision-making, which is effective Improve the decision-making benefits of both.
  • the offloading overhead of the first computing task includes the first energy consumption overhead and the first delay overhead, where the first energy consumption overhead includes the A computing task is offloaded to the energy consumed by the first edge.
  • the first delay overhead includes the delay for the first terminal device to offload the first computing task to the first edge node and the calculation result of the first edge node determining the first computing task Time delay.
  • the local overhead of the first computing task includes the second energy consumption overhead and the second delay overhead, where the second energy consumption overhead includes the local computing of the first terminal device.
  • the first offloading decision also includes a second working power, and the second working power is the work corresponding to the minimum cost of the first computing task when the MADRL algorithm reaches the termination condition power.
  • the first terminal device when the first offloading decision instructs the first terminal device to offload the first computing task to the first edge node for calculation, the first terminal device operates at sleep power.
  • the method further includes: the first terminal device uses a first parameter to dynamically adjust the first delay overhead, and the first parameter is used to indicate the relationship between the first terminal device and the One edge node handles the technical difference of computing tasks.
  • the method further includes: the first terminal device uses a second parameter to dynamically adjust the first energy consumption expenditure and the second energy consumption expenditure, and the second parameter is used to indicate The sensitivity of the first terminal device to energy consumption.
  • the energy trade-off parameter is introduced, which can more finely describe the content of the decision-making in the calculation and offloading.
  • a method in a second aspect, includes: a first edge node receives a status of one or more tasks, the status of one or more tasks includes a first status of a first computing task sent by a first terminal device,
  • the first edge node is an edge node that provides computing resources for one or more second terminal devices, and the first terminal device is one of the one or more second terminal devices;
  • the first edge node is based on one or more Task status, determining the second offloading decision, the second offloading decision includes the computing resource allocation information of the first edge node to one or more second terminal devices; the first edge node broadcasts the second offloading to one or more second terminal devices decision making.
  • the edge node receives the status of the computing task sent by one or more terminal devices it serves, and determines the distribution of computing resources according to the status of the received one or more computing tasks, and then the edge node sends
  • the terminal device broadcasts the allocation of computing resources, and the terminal device determines whether to offload the computing task to the edge node for calculation according to the resource allocation, so that the edge node and the terminal device can have a broader perception of the environment in actual decision-making. So as to effectively improve the decision-making benefits of both.
  • the first edge node receives the first calculation task sent by the first terminal device; the first edge node determines the calculation result of the first calculation task; the first edge node sends The first terminal device sends the calculation result of the first calculation task.
  • the first edge node determines the second offloading decision according to the status of one or more tasks, including: the first edge node updates the first offload decision according to the status of one or more tasks.
  • the third state of an edge node obtains the fourth state of the first edge node, where the third state is the state before the first edge node receives one or more task states; the first edge node determines the first state according to the fourth state.
  • the cost value of the edge node the cost value of the first edge node is the cost of the first edge node allocating computing resources for one or more computing tasks; the first edge node determines the second offloading decision according to the cost value of the first edge node.
  • the first edge node determines the cost value of the first edge node according to the fourth state, which includes: the first edge node uses multi-agent deep enhancement according to the fourth state Learn the first cost function and the second cost function in the MADRL algorithm to determine the cost value of the first edge node; the first cost function includes an offload cost function and a local calculation cost function, where the offload cost function is used to determine one or more tasks The local calculation cost function is used to calculate the local cost of one or more tasks; the second cost function includes an average cost function and a fair cost function, where the average cost function is used according to the uninstall cost of one or more tasks and The local cost determines the average cost of one or more tasks.
  • the fair cost function is used to determine the fair cost of the first edge node according to the number of second terminal devices using the computing resources of the first edge node; the first edge node determines the fair cost of the first edge node according to one or more The average cost of the task and the fair cost of the first edge node determine the cost value of the first edge node.
  • the cost of the edge node weighs a variety of factors.
  • the decision item of the edge node ensures the average experience of the terminal equipment served, and at the same time improves the fairness of the terminal equipment in the use of resources, that is, ensuring efficient resource allocation. At the same time, it also avoids the number of service users being too small, so that the real environment reflected by the cost value of edge nodes is more comprehensive.
  • the first edge node determines the second offloading decision according to the cost value of the first edge node, including: the first edge node iteratively updates the first edge node's value according to the MADRL algorithm State and the cost value of the first edge node; when the MADRL algorithm reaches the termination condition, the first edge node determines the second offloading decision according to the minimum cost value of the first edge node.
  • the present application provides a communication device that has a function of implementing the method in the first aspect or any possible implementation manner thereof.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above-mentioned functions.
  • the present application provides a communication device that has a function of implementing the method in the second aspect or any possible implementation manner thereof.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above-mentioned functions.
  • the present application provides a communication device, including an interface circuit and a processor, the interface circuit is configured to receive computer codes or instructions and transmit them to the processor, and the processor runs the computer codes or instructions , The method in the first aspect or any of its implementation manners is implemented.
  • the present application provides a communication device, including an interface circuit and a processor, the interface circuit is configured to receive computer codes or instructions and transmit them to the processor, and the processor runs the computer codes or instructions , The method in the second aspect or any of its implementation manners is implemented.
  • the present application provides a communication device including at least one processor, the at least one processor is coupled to at least one memory, the at least one memory is used to store a computer program or instruction, and the at least one processor is used to The computer program or instruction is called and executed from the at least one memory, so that the communication device executes the method in the first aspect or any possible implementation manner thereof.
  • the communication device may be a terminal device.
  • the present application provides a communication device, including at least one processor, the at least one processor is coupled to at least one memory, the at least one memory is used to store a computer program or instruction, and the at least one processor is used to The computer program or instruction is called and executed from the at least one memory, so that the communication device executes the method in the second aspect or any possible implementation manner thereof.
  • the communication device may be an edge node.
  • this application provides a computer-readable storage medium in which computer instructions are stored.
  • the computer instructions are run on a computer, the first aspect or any of its possible implementations The method is implemented.
  • the present application provides a computer-readable storage medium in which computer instructions are stored.
  • the computer instructions are run on a computer, the second aspect or any of its possible implementations The method is implemented.
  • the present application provides a computer program product.
  • the computer program product includes computer program code.
  • the computer program code runs on a computer, the first aspect or any of its possible implementations The method is implemented.
  • this application provides a computer program product, the computer program product comprising computer program code, when the computer program code runs on a computer, the second aspect or any of its possible implementation manners The method is implemented.
  • the present application provides a wireless communication system, including the communication device in the seventh aspect and the communication device in the eighth aspect.
  • Fig. 1 shows a schematic diagram of a system architecture applicable to an embodiment of the present application.
  • Figure 2 is a schematic diagram of the training process of reinforcement learning.
  • Figure 3 is a schematic diagram of a DRL-based computational offloading method.
  • Figure 4 is a schematic diagram of the interaction process between the agent and the environment in MADRL.
  • FIG. 5 is a schematic flowchart of the method for calculating offloading provided by this application.
  • FIG. 6 is a schematic flowchart of the method for calculating offloading provided by this application.
  • Fig. 7 is a schematic flow chart of data collection based on MADDPG provided by the present application.
  • Fig. 8 is a schematic flowchart of the MADDPG-based parameter model training provided by the present application.
  • Fig. 9 is a schematic flowchart of a method for computing offloading based on MADDPG provided by the present application.
  • FIG. 10 is a schematic block diagram of a communication device 1000 provided by this application.
  • FIG. 11 is a schematic block diagram of a communication device 2000 provided by this application.
  • FIG. 12 is a schematic structural diagram of the communication device 10 provided by this application.
  • FIG. 13 is a schematic structural diagram of the communication device 20 provided by this application.
  • LTE long term evolution
  • FDD frequency division duplex
  • UMTS time division duplex
  • NR new radio
  • 5G fifth-generation
  • V2X can include vehicle-to-network (V2N), vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I) ), vehicle to pedestrian (V2P), etc.
  • long term evolution-vehicle (LTE-V) of workshop communication Internet of Vehicles, machine type communication (MTC), Internet of Things (internet of things, IoT), long-term evolution-machine (LTE-M), machine-to-machine (M2M), etc.
  • FIG. 1 shows a schematic diagram of a system architecture applicable to an embodiment of the present application.
  • the system architecture includes at least one terminal device, at least one wireless access node, and at least one edge node.
  • the terminal device accesses the resource pool composed of nodes located at the edge of the Internet (ie, the edge of the network) (ie, the resources composed of edge nodes) through a variety of wireless access methods (including but not limited to cellular and Wi-Fi) and corresponding wireless access points. Pool).
  • wireless access methods including but not limited to cellular and Wi-Fi
  • Edge node means a business platform built on the edge of the network close to the terminal device, providing storage, computing, network and other resources, and sinking some key business applications to the edge of the access network to reduce network transmission and multi-level forwarding. Width and delay loss.
  • This application uniformly abstracts devices with computing capabilities at the edge of the network as edge nodes. However, this application itself does not involve the abstraction, management, and allocation functions of computing resources in edge nodes, but only points out the strategy for edge nodes to allocate computing resources to terminal devices in a unified whole.
  • the terminal devices, edge nodes, and wireless channels in the system architecture shown in FIG. 1 have the following provisions based on the actual situation: at each moment, a single terminal device can only access one wireless channel, and obtain information through this channel.
  • the entry point provides computing resources to terminal devices, and can provide computing resources to multiple terminal devices at the same time.
  • the edge node connected to the terminal device and the wireless channel of communication are: conn i ⁇ 0 ⁇ N and link i ⁇ ⁇ 0 ⁇ Z.
  • the terminal equipment in the embodiments of this application may also be referred to as: user equipment (UE), mobile station (MS), mobile terminal (MT), access terminal, user unit, user station, Mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication device, user agent or user device, etc.
  • UE user equipment
  • MS mobile station
  • MT mobile terminal
  • access terminal user unit, user station, Mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication device, user agent or user device, etc.
  • the terminal device may be a device that provides voice/data connectivity to the user, for example, a handheld device with a wireless connection function, a vehicle-mounted device, and so on.
  • some examples of terminal devices are: mobile phones (mobile phones), tablet computers, notebook computers, handheld computers, mobile internet devices (MID), wearable devices, virtual reality (VR) devices, augmented Augmented reality (AR) equipment, wireless terminals in industrial control, wireless terminals in self-driving (self-driving), wireless terminals in remote medical surgery, and smart grid (smart grid) Wireless terminals in transportation safety (transportation safety), wireless terminals in smart city (smart city), wireless terminals in smart home (smart home), cellular phones, cordless phones, session initiation protocols (session initiation) protocol, SIP) phones, wireless local loop (WLL) stations, personal digital assistants (personal digital assistants, PDAs), handheld devices with wireless communication capabilities, computing devices or other processing devices connected to wireless modems, In-vehicle equipment, wearable equipment, terminal equipment in the future 6
  • wearable devices can also be called wearable smart devices, which are the general term for using wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories.
  • Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones.
  • the terminal device may also be a terminal device in the Internet of Things system.
  • IoT is an important part of the development of information technology in the future.
  • Machine interconnection an intelligent network of interconnection of things.
  • the terminal equipment may also include sensors such as smart printers, train detectors, gas stations, etc.
  • the main functions include collecting data (part of the terminal equipment), receiving control information and downlink data from network equipment, and sending electromagnetic waves. , To transmit uplink data to network equipment.
  • Edge computing evolved from the concept of "cloud computing", it is a new type of distributed computing model. The specific definition is: to provide users with computing and storage resources at the edge of the Internet. Among them, “edge” is defined as any network location on the path between the terminal device and the cloud data center, which is closer to the user than the cloud data center. Generally speaking, “edge computing” refers to the deployment of resources near the junction of wireless links and wired networks, that is, near wireless access points. Edge computing has the characteristics of lower latency, lower energy consumption, bandwidth saving, strong privacy, and more intelligence. As a key enabling technology to realize the vision of the Internet of Things and 5G, edge computing (and its derivative concepts) is in computing mode and The intersection of network communication plays an irreplaceable role in satisfying a wide range of business needs and improving user experience.
  • Computing offloading refers to the behavior of computing devices that transfer resource-intensive computing tasks to separate processors or other devices.
  • a computing device refers to a terminal device
  • a resource-intensive computing task is a task that requires a certain amount of calculation to complete
  • the transfer location is an edge node at the edge of the network.
  • the typical process is: the terminal device offloads the calculation task to the edge node, the edge node processes the calculation task, and the terminal device receives the calculation result of the corresponding task of the edge node.
  • edge computing Similar to the computing offloading method in cloud computing, specifically, a possible way to use edge computing to improve terminal performance is: terminal equipment offloads computing-intensive tasks to the edge of the network for execution, and accepts the corresponding running results In order to achieve the purpose of calculating unloading.
  • two aspects of edge computing need to be specifically considered: on the one hand, terminal equipment has insufficient computing power, high energy sensitivity, and fewer support functions; on the other hand, the network edge also exposes scattered computing resources and Phenomena such as different power supply methods and diverse system architectures. Therefore, how to manage and allocate computing power at the edge of the network has become an important issue that requires urgent attention.
  • Deep reinforcement learning (deep reinforcement learning, DRL): Reinforcement learning is a field in machine learning.
  • Figure 2 is a schematic diagram of the training process of reinforcement learning.
  • reinforcement learning mainly includes four elements: agent, environment, state, action, and reward.
  • the input of the agent is the state and the output For the action.
  • the training process of reinforcement learning is: through the agent interacts with the environment multiple times to obtain the actions, states, and rewards of each interaction; using these multiple sets (actions, states, rewards) as training data, the agent Have a training session.
  • the agent is trained for the next round until the convergence condition is met.
  • the process of obtaining an interactive action, state, and reward is shown in Figure 1.
  • the current state of the environment s(t) is input to the agent, and the action a(t) output by the agent is obtained.
  • the relevant performance index under the action of) calculates the reward r(t) of this interaction, so far, the action a(t), action a(t) and reward r(t) of this interaction are obtained.
  • Deep reinforcement learning still conforms to the framework of interaction between the subject and the environment in reinforcement learning. The difference is that deep neural networks are used in agents to make decisions. Deep reinforcement learning is an artificial intelligence method that combines deep learning and reinforcement learning. It has a wide range of applications in complex problems such as dynamic decision-making, real-time control, and image perception, and has produced some DRL-based computational offloading methods. Among them, the algorithms involved in the existing methods for computing offloading include but are not limited to: deep Q-learning (DQN), deep deterministic policy gradient (DDPG), Actor-Critic (actor-jury) )algorithm.
  • DRL-based method model mainly includes the agent and the environment that interacts with it.
  • the terminal device is regarded as the entity of decision-making and execution. It uses a deep neural network to perceive the current state of the environment, and interacts with the environment and adjusts the strategy through a method similar to reinforcement learning. Finally, it gradually undergoes multiple iterations of the perception-adjustment process. Move closer to the optimal strategy.
  • Multi-agent deep reinforcement learning It is a method that combines deep learning and reinforcement learning in a multi-agent system. Deep learning can use deep neural networks to effectively learn from information. Features are extracted from the ground, reinforcement learning continuously strengthens decision-making capabilities through dynamic interaction with the environment, and multi-agent systems emphasize that agents pay attention to environmental factors while also paying attention to the mutual influence of decision-making between agents. Therefore, MADRL is suitable for describing the decision-making and interaction process of multiple agents in a complex environment, and has a wide range of applications in many fields such as robot collaboration, distributed control, and collaborative decision-making.
  • MADRL constructs a system containing multiple agents for a specific problem, and each agent uses a deep reinforcement learning (deep reinforcement learning, DRL) method to describe its decision-making and interaction process.
  • DRL deep reinforcement learning
  • the optimal solution of each agent in a multi-agent system is not only restricted by environmental variables, but also restricted and affected by the behavior of other agents.
  • Multi-agent deep deterministic policy gradient (multi-agent deep deterministic policy gradient, MADDPG) algorithm: It is a typical multi-agent deep reinforcement learning algorithm, which is a deep deterministic policy gradient (DDPG) The extension and expansion of the algorithm in multi-agents, where each agent runs a DDPG model.
  • the MADDPG algorithm implements the multi-agent decision-making process through centralized training and distributed execution. It needs to know the decision information of other agents during training, and only local information is needed to make decisions during execution.
  • Computational power control refers to the behavior of the terminal device adjusting the computational power in the process of processing computational tasks to balance service delay and energy consumption. Since terminal devices in edge computing are energy-limited and have differences in energy consumption sensitivity, controlling the computing power of local processing tasks can effectively improve user experience.
  • the decision-making entities in the problem of edge computing offloading in this application include two types: terminal devices (computing offloaders) and edge nodes (resource providers).
  • terminal devices inputing offloaders
  • edge nodes resource providers
  • the terminal device when the terminal device generates a computationally intensive task, it will decide whether to process the task locally (that is, not to perform computational offloading), or to package the computational task through the network and transmit it to the edge node, and then return the task after receiving the edge node processing task. result.
  • the edge node needs to dynamically count the currently available resources and allocate computing resources to the required terminal devices according to the current environmental conditions and terminal status.
  • Resource-constrained terminal devices need to determine the correspondence between the computing tasks to be processed in the terminal device and the available computing resources based on the current status of the wireless environment, computing capabilities, and energy modes, that is, where and how to make decisions Which calculation task to perform.
  • local execution avoids the overhead caused by transmitting tasks in the network; on the other hand, computing offloading reduces the time and energy consumption of performing computing tasks.
  • FIG. 3 is a schematic diagram of a DRL-based computational offloading method.
  • each terminal device has a computationally intensive and delay-sensitive task that needs to be completed.
  • the terminal device can choose to process the calculation task locally, or can choose to offload the calculation task to the remote end through the base station. Since the wireless transmission rate during the calculation of the offloading process is interfered by the background noise and the communication of other terminal devices, the terminal device needs to weigh the benefits of the local processing and the calculation of offloading to make a decision.
  • the terminal device runs the Actor-Critic algorithm, it will maintain two neural networks Actor and Critic at the same time, and respectively correspond to two sets of strategy parameters.
  • the input of Actor is the current state
  • the output is the strategy action that can obtain the highest reward in the current state
  • the input of Critic is the current state
  • the output is the return reward of the environment and the reward estimated by itself.
  • the above-mentioned single-agent DRL assumes that the decision-making environment is stable and will not change due to the decision itself. This is difficult to establish when multiple terminal devices simultaneously decide whether to calculate and uninstall. Specifically, the decision of whether to calculate offloading corresponds to the competitive relationship between the terminal device on the wireless channel and external computing resources, and at the same time affects the decision reward. Therefore, for each terminal device, its own decision-making process may lead to changes in the decision-making of other terminal devices, which in turn triggers dynamic changes in the environment in the DRL. When DRL's assumption of environmental stability does not hold, the algorithm will no longer converge, which will cause the performance degradation of the solution method.
  • the existing methods based on DRL focus on the interaction process between the agent and the environment, and pay less attention to the interaction mechanism and information content between the agent and the agent, which makes it difficult for the agent to reasonably obtain the information from the environment. More effective information will limit the further improvement of decision-making benefits.
  • each agent in MADRL still uses a set of DRL methods, which includes a system of perception and control, and consists of five parts: agent, decision-making action, state, reward and environment.
  • the agent is the decision-making entity in the system.
  • the input S'in Figure 4 reflects the current environment for a state of the agent, and the output A is the action performed by the agent in a given state. All agents share the same environment, but their actions, states, and rewards are often different.
  • the environment is used to interact with multiple agents, maintain the current state set of all agents, show the corresponding current states to the agents, and generate corresponding rewards and the next state set according to the actions of all agents.
  • the training process of MADRL is the process of obtaining the optimal strategy of the agent in a given environment.
  • This process is realized through the continuous interaction between the above-mentioned multiple agents and the environment: each agent obtains a high-dimensional observation from the environment State S uses a method similar to deep learning (DL) to perceive and analyze the observed state, and then evaluate each feasible action based on the expected return, and make decision-making actions in the current state according to the strategy.
  • DL deep learning
  • the environment is sensitive to all intelligence
  • the decision-making action of the entity reacts uniformly, so that the next state set S′ and the corresponding reward set R are obtained.
  • the difference from single agent DRL is that the strategy learning in MADRL is a centralized process.
  • each agent can know the strategy of all agents for the current observation state, and can know the decision reward set R of all agents and the next state set, so as to avoid the mutual influence of agent decisions during the learning process.
  • the non-stationarity In the MADRL strategy inference process, each agent makes independent decisions based on its own observational state of the environment.
  • the strategy learning and inference process described above is an artificial intelligence method similar to group behavior.
  • this application provides a computing offloading method based on multi-agent deep reinforcement learning
  • the method specifically, by optimizing and weighing the two main indicators of service delay and energy consumption, can further improve the decision-making revenue of computing offloading tasks, and achieve the purpose of enhancing user experience and improving resource efficiency.
  • the revenue of the terminal device is composed of two stages of performance: network transmission and task calculation.
  • the time consumption of task calculation is affected by the resource allocation of edge nodes.
  • the processing cost of network transmission is composed of two parts: wireless link transmission and wired link transmission. Considering that the wireless network environment in the end-edge (terminal device-edge node) interaction is more complex, and the environmental impact and transmission interference are more serious, so this application focuses on the phenomenon of environmental noise and bandwidth contention in the wireless link.
  • the range of environmental noise includes communication interference on the channel caused by background noise and other devices that do not participate in the calculation of the offloading decision
  • the range of bandwidth contention includes the mutual communication interference of all terminal devices that use the channel and participate in the calculation of the offloading decision.
  • the processing time for calculating the offloading process is usually on the second level or less. In this time granularity, the movement of the terminal device can be expected for the connection state of the edge node and the wireless access point. Therefore, this application When considering the calculation of the offloading decision, the state of the link between the terminal device and the edge node, and the terminal device and the wireless access point does not change during the decision and execution of the default calculation offloading decision.
  • FIG. 5 is a schematic flowchart of the computing offloading method provided by this application.
  • S510 The first terminal device sends the first state of the first computing task to the first edge node.
  • the first edge node receives the first state sent by the first terminal device.
  • the state of the computing task of any terminal device may include, but is not limited to: the length of the data stream of the computing task, the time required for the computing task, the penalty value of the computing task, and other computing task-related parameters.
  • mt represents the length of the data stream required to transmit the calculation task, in bytes
  • mc represents the number of clock cycles required to process the calculation task, in hertz
  • md represents the calculation task from the start of the execution to the terminal where the calculation result is obtained
  • md is a fixed value related to the task, the fixed value is given by the application generated by the task when the task is generated, the unit is second
  • mp represents when the computing task cannot be successfully processed
  • the penalty value is a part of the terminal equipment and edge node calculation of revenue.
  • the first edge node determines a second offloading decision according to the received state of one or more computing tasks, and the second offloading decision includes the allocation information of the computing resources of the terminal device served by the first edge node.
  • the first edge node may use the MADRL algorithm to determine the second offloading decision according to the received state of one or more computing tasks.
  • the specific decision-making process of the MADRL algorithm will not be described here for the time being, and will be described in detail in the flowchart corresponding to Figure 6.
  • S520 The first terminal device receives the second offloading decision sent by the first edge node.
  • the first edge node sends the second offloading decision to the first terminal device.
  • the first terminal device determines the first offloading decision of the first computing task according to the second offloading decision.
  • the first offloading decision is used to instruct the first terminal device to locally calculate the first computing task or the first terminal device to offload the first computing task to the first edge node for calculation.
  • the first unloading decision may also include the second working power.
  • the second working power is the working power for locally calculating the first calculation task determined by the first terminal device according to the MADRL algorithm.
  • the working power of the terminal device is the sleep power of the first terminal device.
  • the first terminal device may use the MADRL algorithm to determine the first offloading decision of the first computing task according to the received second offloading decision.
  • the specific decision-making process of the MADRL algorithm will not be described here for the time being, and will be described in detail in the flowchart corresponding to Figure 6.
  • the above technical solution describes the specific content of the interaction process and interactive information between the terminal device and the edge node in the process of computing offloading.
  • the terminal device and the edge node enable both parties to have a broader perception of the environment in actual decision-making, which can Effectively improve the decision-making benefits of the above two, while the information interaction process is simple and easy to implement.
  • FIG. 6 is a schematic flowchart of the computing offloading method provided by this application.
  • the decision-making process between the terminal equipment and the edge node is a closed-loop process, and its static state is the "waiting" step, and the execution process is triggered at the beginning of each time slice.
  • the decision-making entity in this application includes two types: one or more terminal devices and one or more edge nodes. Take terminal device i and edge node j as examples for illustration.
  • the offline training process of the MADRL algorithm can be performed at any location, and only the trained edge node parameter model needs to be deployed on the corresponding edge node, and the trained parameters of one or more terminal devices The model can be deployed on each terminal device, so as to realize the calculation offloading method based on the MADRL algorithm.
  • the terminal equipment i calculation and uninstallation process is as follows:
  • S601 The decision-making process of the terminal device i (that is, an example of the first terminal device) is in a waiting state, and S602 is executed when each time slice starts.
  • the time modeling method based on the MADRL algorithm in this method is the same as the time modeling method in other related methods.
  • the continuous time is divided into time slices with a length of ⁇ seconds and disjoint ( ⁇ >0).
  • the length ⁇ of the time slice can be determined according to the specific conditions of the service to be uninstalled, which is not specifically limited in this application.
  • S602 The terminal device i judges whether there is a task to be processed, if not, skip to S601, otherwise, execute S603.
  • the execution process of terminal device i processing computing tasks is a process of opening a new thread in the decision-making process, and executing computing tasks in the new thread with the decision results. It should be understood that terminal device i processes a computing task at the same time slice, and The process of processing a single computing task may last for several time slices (ie, processing rounds). Assuming that the task generation and execution do not consider the influence of the queue mechanism on the decision-making action, that is, if a new computing task event is generated in the process of processing a computing task, the terminal device will ignore the new task and generate a corresponding task penalty value. Similarly, if the terminal device finds that the task execution time exceeds the limit when making a decision, it will ignore the task and generate a penalty value. For the penalty value of the task, refer to the definition of mp in S510.
  • the number of rounds (that is, processing rounds) in which the time slice state of the terminal device processes a single computing task is busy can be calculated in the following way:
  • the processing time T refers to the length of time from the time slice in which the terminal device decides how to execute the decision to the completion of the task execution. It is the ceiling function, which means rounding up.
  • the calculation of the duration of the local processing task and the offloading calculation task will be described in detail in the [Revenue Index] below, and the description will not be expanded here.
  • the terminal device i sends a task summary to the edge node j (that is, an example of the first edge node).
  • the edge node j receives the task summary sent by the terminal device i.
  • the task summary includes the state M i of the to-be-processed computing task X (that is, an example of the first state of the first computing task).
  • S605 is executed.
  • M i ⁇ mt i, mc i, md i, mp i ⁇ .
  • M i ⁇ mt i, mc i, md i, mp i ⁇ .
  • M 0 ⁇ 0,0,0,0 ⁇ .
  • the task summary also includes the current channel noise w i and the current channel gain H i .
  • the terminal device i can send the above information to the edge node j in the form of a network message, and a specific message content form is
  • the terminal device i receives the status information broadcast by the edge node j.
  • the edge node j broadcasts the status information to the served terminal device through the connected wireless access point.
  • the state information includes the resource allocation strategy of the edge node j for the terminal device it serves (that is, an example of the second offloading decision).
  • a specific state content of edge node j can be
  • the decision sequence number d+j of the edge node is to distinguish it from the decision sequence number i of the terminal device.
  • a (j,i) ⁇ 0,1,...,a (j,avail) ⁇ represents the number of computing resources allocated by edge node j to terminal device i
  • 0 represents no computing resources are allocated
  • a (j,avail) Indicates that all currently available resources are allocated.
  • the terminal device i uses the MADRL algorithm to determine the action decision (that is, an example of the first offloading decision) according to the state information broadcast by the edge node j and its own static state (including the partial observation of the environment by the terminal device i).
  • the terminal device itself i X i quiescent state corresponding to the static state space as follows, it is noted that the multi-agent is not a static state (i.e., a plurality of terminal devices) at a depth of reinforcement learning algorithm The state at the time the decision was made.
  • the static state space corresponding to the terminal device refers to the state space formed by partial observation of the problem environment when the terminal device and the edge node do not exchange state information, that is, the static state space corresponding to the terminal device i can be expressed as , Where M 0 means no task, M i_1 , M i_2 ,..., M i_max are all possible computing task types of the terminal device i.
  • w i_1 , w i_2 ,..., w i_max are the degree of discretized environmental interference, and the granularity can be determined according to the fluctuation range of environmental interference.
  • MD i max (md i_1 , md i_2 ,..., md i_max ), that is, the maximum value of the execution time of all computing tasks of the terminal device i is taken.
  • the MADRL-based computational offloading decision-making process is completed.
  • the income index please refer to the following text, and the description will not be expanded here.
  • the termination condition includes a convergence criterion. For example: the algorithm is executed to a certain extent, and then it is difficult to get a smaller cost value when it is executed. At this time, it is considered that the algorithm has converged and reached the termination condition.
  • the termination condition includes a time standard.
  • a time standard For example: the execution time of the algorithm, the number of iterations, etc. are given in advance.
  • the action decision may further include calculating a power factor (that is, an example of the second working power).
  • a power factor that is, an example of the second working power.
  • p i is the working power corresponding to the minimum cost value of the first calculation task when the MADRL algorithm reaches the termination condition.
  • the p i here represents the power gear, and does not represent a specific power value.
  • 0 means that the terminal device processes tasks with the power value of the lowest gear.
  • the power value corresponding to the lowest gear here is the sleep power P sleep .
  • P sleep is a relatively low power value, which does not mean that the sleep power is 0.
  • p max indicates that the task is processed in the maximum power gear, and the power of the actual processing task is the maximum working power value P max of the terminal device.
  • the frequency corresponding to the processing calculation task of the power factor of the terminal device i is: in and It is a constant value that has nothing to do with the decision-making option, and respectively represents the frequency value of the corresponding processing calculation task of the terminal device at the dormant power and the maximum power.
  • a power factor is designed in this application to indicate the adjustment of the calculation frequency Function.
  • the terminal device when the terminal device decides to execute the task in a local execution mode, and uses p i as the power factor for processing the calculation task, the terminal device will first determine whether p i is equal to 0: if p i is not equal to 0 (that is, it is not a sleep Power) means that the terminal device needs to switch from the sleep power to the power corresponding to p i to perform the calculation task, so it will call the system API to switch the calculation frequency. There will be a certain time delay and energy overhead. Regarding the local processing task time The delay and energy cost calculation will be described in detail in the [Earnings Index Setting] in the following text, and the description will not be expanded here.
  • the decision for terminal equipment includes the following constraints:
  • Terminal i is determined behaviors and when A i task handle long whether a timeout, when T i is less than md i, the terminal apparatus i determination processing round round i computational task X, and then jump to S607, when T i is greater than the task time limits md At i , it means that the execution of the computing task X will time out, and the terminal device i directly ignores the task and generates a corresponding penalty value mp i , and then jumps to S601.
  • T i is equal to md i
  • the terminal device i executes 601 or 607 according to the actual calculation task, which is not specifically limited in this application.
  • the terminal device i starts a new thread to process the computing task X in a decision-making manner, and then jumps to S601.
  • the opened new thread will obtain the relevant information of the decision A i and the calculation task X, and start to execute S608 to S612 in an asynchronous manner.
  • the execution process of the computing task X of the terminal device i is a process step executed in the new thread after the new thread is started in the decision-making process, so the decision-making process and the execution process will not block each other.
  • the terminal device i can only process one computing task at the same time slice, and the process of processing a single computing task may last for several time slices (that is, processing rounds).
  • the new thread steps for terminal device i to process computing task X are as follows:
  • the terminal device i judges whether the decision action mode of the current decision A i is the calculation uninstall mode, if it is, execute S609, otherwise jump to S611.
  • the terminal device i sends the information required for processing the computing task X to the edge node j.
  • the wireless transmission rate will affect the processing delay, and the wireless transmission rate is also affected by environmental noise and bandwidth contention.
  • S610 The terminal device i receives the processing result of the task X returned by the edge node j, and then jumps to S612.
  • edge node j the calculation offloading process of edge node j is as follows:
  • S701 The decision process of the edge node j is in a waiting state, and the edge node j executes S702 or S703.
  • the edge node j executes S702 at the beginning of each time slice.
  • the edge node j executes S703 first, and then executes S702 after S704.
  • S702 Determine currently available computing resources of the edge node j.
  • S703 The edge node j starts to receive the task summary of the terminal device being served.
  • an edge node may receive one or more task summaries sent by the served terminal devices, so the process will remain for a period of time.
  • S704 The edge node j stops receiving the task summary, and at this time determines the task summary information of the terminal device served in the current time slice, and then jumps to S705.
  • Summary of the one or more tasks edge node j in the received digest calculation task includes a task X i transmitted from the terminal apparatus, the task status summary includes information computing task X M i.
  • the task summary also includes the current channel noise w i and the current channel gain H i .
  • the edge node j receives the network packet sent by the terminal device i
  • the edge node j adopts the MADRL method to determine the allocation strategy of currently available resources according to the received one or more task summaries.
  • the decision sequence number d+j of the edge node is to distinguish it from the decision sequence number i of the terminal device.
  • a (j,i) ⁇ 0,1,...,a (j,avail) ⁇ represents the number of computing resources allocated by edge node j to terminal device i
  • 0 represents no computing resources are allocated
  • a (j,avail) Indicates that all currently available resources are allocated.
  • f unit is a constant value that is irrelevant to the decision option, and the unit is hertz, which represents the computing frequency represented by the number of computing resources per unit.
  • g (j, i) is the number of computing resources that the edge node j has allocated to the terminal device i before this round of decision-making, that is, the number of resources that the edge node j is using.
  • the state of the edge node j changes to (That is, the fourth state of the first edge node), where X d+j (that is, the third state of the first edge node) is the state before the edge node j receives multiple computing task summaries, among which the state that has not received a message
  • the message is defined as (It means that the terminal device n has not sent a message to the edge node j at the current time, and the terminal device m is a terminal device served by the edge node j).
  • the decision for edge nodes includes the following constraints:
  • the edge node only allocates resources for the currently served terminal equipment, that is, there is no i ⁇ D and j ⁇ N, so that conn i ⁇ j and a( j,i )>0.
  • g (j, i) is the number of resources that the edge node j has allocated to the terminal device i before the start of the current time slice.
  • a specific form of message content can be
  • S706 The edge node j broadcasts its resource allocation strategy through the connected wireless access point, and then jumps to S707.
  • S707 The edge node j starts multiple new threads to perform task processing, and then jumps to S701. Among them, the number of threads opened should not be less than the number of terminal devices allocated resources.
  • the execution process of the computing task of the edge node j is a process step executed in the new thread after the new thread is started in the decision-making process, so the decision-making process and the execution process will not block each other.
  • the edge node j can also use the thread pool method to establish the following asynchronous process.
  • the opened new thread will receive the computing task from the specific terminal device being served, and will start to execute S708 to S711 in an asynchronous manner according to the resource allocation strategy of the decision.
  • edge node j processing each new thread of computing task.
  • the edge node j receives the computing task information of the specific terminal device it serves. It should be understood that, among the terminal devices that the specific terminal device serves for the edge node j, it is determined that the computing task is offloaded to the terminal device that is calculated by the edge node j.
  • the action decision of terminal device i in S608 determines to offload computing task X to edge node j for computing, then terminal device i sends information about computing task X to edge node j, and correspondingly, edge node j receives the calculation sent by terminal device i Information about task X.
  • S709 The edge node j processes the received computing task according to the resource allocation strategy in S605.
  • a specific implementation manner may divide the working time of the processor into small parts by means of resource time division multiplexing, and allocate them to each computing thread according to the proportion of the resource allocation to the total amount of resources.
  • the terminal device i and the edge node j have completed the process of calculating and unloading using the MADRL algorithm.
  • the following mainly introduces the decision profit index of the MADRL algorithm in S606 and S705 in detail.
  • the revenue index of terminal equipment is divided into two situations: local execution and computational offloading.
  • the time for locally executing task X (that is, the task processing time in S603) can be calculated by the following formula: (That is, an example of the second delay overhead).
  • F i represents the frequency at which the terminal device i processes the calculation task determined according to the power factor
  • Is an indicator function when the inequality at the bottom right is true, the function value is 1, otherwise the function value is 0,
  • It represents the number of clock cycles required for the terminal device i to adjust from the sleep power to p i , which is a fixed value for a specific terminal device i.
  • the second case is a first case
  • 1Calculating the time for uninstalling and executing task X can be calculated by the following formula: (I.e. an example of the first delay overhead), which is the time to transmit the data stream of the specific computing task Processing time with edge nodes Sum.
  • the interference comprises a communication channel for other wireless devices that do not participate in the calculation of the background noise in the unloaded decision. Indicates the sum of interference from other devices using the same channel that make the calculation offloading decision in the current time slice.
  • 2Calculate the energy consumption of unloading and executing task X can be expressed as (I.e. an example of the first energy consumption overhead), which is the sum of the data transmission energy consumption and the receiving energy consumption, where the receiving energy consumption includes part of the tail energy consumption generated by the waiting process after receiving, and E tail is the energy consumption of the receiving task. It takes a short time to receive the task, and it can be approximated as a constant value in general.
  • the cost of the terminal device performing the calculation task weighs the task execution time and the corresponding energy consumption, which is specifically expressed as:
  • the above-mentioned cost formula C i (A, X) of the terminal device i represents the decision set of the terminal device i and all decision entities (that is, all terminal devices and edge nodes) according to the state of the current task A. Dealing with the cost value corresponding to the resource consumption generated by the decision-making process.
  • ⁇ i ⁇ (0,1] (that is, an example of the first parameter) is a technical difference factor, which is used to weigh the technical difference between the processing tasks of the terminal device i and the corresponding edge node.
  • ⁇ i 1.
  • ⁇ i ⁇ 0 (that is, the first An example of the two parameters) is an energy trade-off factor, which is used to indicate the sensitivity of the terminal device i to energy consumption.
  • ⁇ i 1.
  • decision-making is A i is not the final act of the decision-making terminal device i
  • the decision to move the terminal device i use a MADRL algorithm iterative learning process once the decision.
  • the final decision of the terminal device i is when the MADRL algorithm reaches the termination condition, the action decision is determined by calculating the minimum cost value of the calculation task X according to the profit index.
  • ct j represents the average cost of the terminal equipment served by the edge node j in the current time slice, namely:
  • cf j (i.e. an example of a fairness cost function) is a fairness factor set according to the number of terminal devices currently using the computing resources of edge node j, which can be specifically expressed as:
  • g′ (j, i) is the number of resources that the edge node j has allocated to the terminal device i at the end of the current time slice.
  • K(x) is a non-negative monotonic increasing function, and an empirical function can be specifically designed according to actual conditions, which is not limited in this application.
  • the final decision of the edge node j is an action decision determined by calculating the minimum cost value of the edge node j according to the profit index of the edge node when the MADRL algorithm reaches the termination condition.
  • the interaction process and interaction information between the terminal device and the edge node in the computing offloading process are described in detail, so that the two parties can have a wider perception of the environment in actual decision-making, which can effectively improve the decision-making of the above two
  • the information interaction process is simple and easy to implement, and the corresponding cost of the process is relatively small.
  • the proposed profit index weighs a variety of factors, and at the same time defines the corresponding fairness factor according to the actual situation.
  • the actual environment reflected by the indicators is more comprehensive.
  • benefiting from the advantages of the MADRL algorithm there are fewer necessary approximations in this application and the stability of the algorithm in solving problems can be guaranteed at the same time.
  • the necessary approximation here refers to the approximate assumptions or conditions that have to be introduced when solving the problem, and MADRL is more adaptable to the problem when solving.
  • the power control factor is introduced in the profit index of the local processing calculation task of the terminal equipment, and the problem of the cost loss during power switching is analyzed and solved. This helps to more finely describe the decision content in the calculation and offloading. This effectively improves user experience and reduces model errors.
  • the decision items for the edge nodes in this application also ensure the average experience of the terminal equipment served, and also improve the fairness of the network terminal in the use of resources, that is, the efficient allocation of resources is also avoided while the service The number of users is too small.
  • MADDPG multi-agent deep deterministic policy gradient
  • the computational offloading method based on MADDPG includes offline training and online derivation process, among which, offline training includes data collection and training process.
  • offline training includes data collection and training process.
  • the offline training process and online derivation process of MADDPG are introduced in conjunction with Figures 7-9.
  • FIG. 7 is a schematic flowchart of MADDPG-based data collection provided by the present application. Among them, the specific steps of the training data collection process are as follows:
  • Collect terminal equipment static information Collect equipment-related static information from the terminal equipment, including: the terminal equipment's computing capabilities, transmission power, association status, task types, etc.
  • Establish a distance-gain fitting model establish a distance-gain fitting model based on dynamic information such as transmission rate and physical distance to obtain the channel gain H of the terminal equipment in the current state, and randomly generate background noise w.
  • Simulate trajectory and task generation event simulate the terminal movement trajectory and task generation mechanism according to the given parameters. It is assumed that the terminal moves in a random waypoint mode to generate a movement trajectory, and it is assumed that the computing task occurs in a random event and follows the Bernoulli distribution. Then, skip to (7).
  • Collect terminal dynamic information Collect dynamic information required for decision-making such as channel interference, transmission rate, and task generation events from the terminal. After the collection is complete, proceed to the next step.
  • FIG. 8 is a schematic flowchart of the MADDPG-based parameter model training provided by the present application.
  • the initial state of the system includes: initial task status, network link status, edge node connection status, and network noise status.
  • the processing round is 0 (that is, the terminal device is in an idle state at this time).
  • the initial state of the system is (0,0,0,...,0), which means that no computing resources have been allocated to the terminal devices being served at the beginning.
  • the edge node To execute the computational offloading process, the edge node first determines the decision based on the strategy parameters, observation status and random function.
  • the terminal device determines the corresponding decision according to the strategy parameter, the observation state and the random function.
  • the training process obtains the action cost C of the terminal device and the edge node for the task execution in the round of time slices, and generates the next system state X'.
  • the training process of each terminal device and edge node is successively executed, and the policy gradient of the corresponding model is updated.
  • FIG. 9 is a schematic flowchart of a method for computing offloading based on MADDPG provided by the present application.
  • Fig. 9 is basically the same as the calculation unloading process described in Fig. 6, and only a brief description is given here.
  • the edge node performs the initialization process according to the obtained parameter information.
  • the terminal device uploads the static information of the terminal device such as the type of computing task and the computing capability.
  • the edge node delegates similarly configured terminal device model parameters according to the above information, so that the terminal can initialize the model parameters and perform the synchronization process of the time slice.
  • the terminal device executes the MADRL derivation process according to the broadcast information of the edge node and its own state information.
  • the MADDPG algorithm only the Actor parameter is used to decide the corresponding action.
  • the terminal equipment and edge nodes complete the decision-making process and execute corresponding actions.
  • FIG. 10 is a schematic block diagram of a communication device 1000 provided in this application.
  • the communication device 1000 includes a receiving unit 1100, a sending unit 1200, and a processing unit 1300.
  • the sending unit 1200 is configured to send a first state of a first computing task to a first edge node, where the first edge node is an edge node from which the device obtains computing resources, and the first state includes a data stream for transmitting the first computing task At least one of the length of, the number of clocks required to calculate the first calculation task, and the penalty value of the first calculation task;
  • the receiving unit 1100 is configured to receive a second offloading decision sent by the first edge node, the second offloading decision is determined by the first state, and the second offloading decision includes computing resource allocation information of one or more second terminal devices.
  • the second terminal device is a terminal device that obtains computing resources from the first edge node, and the device is one terminal device among one or more second terminal devices;
  • the processing unit 1300 is configured to determine a first offloading decision of the first computing task according to the second offloading decision, and the first offloading decision is used to indicate whether to offload the first computing task to the first edge node for calculation.
  • the receiving unit 1100 and the sending unit 1200 can also be integrated into one transceiver unit, which has both receiving and sending functions, which is not limited here.
  • the sending unit 1200 is further configured to:
  • the receiving unit 1100 is further configured to receive the computing result of the first computing task sent by the first edge node; or when the first offloading decision instructs the communication device not to offload the first computing task At this time, the processing unit 1300 is further configured to locally determine the calculation result of the first calculation task.
  • the processing unit 1300 is specifically configured to update the parameters in the first state of the first computing task to obtain the second state of the first computing task according to the second offloading decision; calculate the second state of the first computing task according to the second state 1.
  • the cost value of the computing task, the cost value of the first computing task includes the local cost and the offloading cost of the first computing task; the first offloading decision of the first computing task is determined according to the cost value of the first computing task.
  • the processing unit 1300 is specifically configured to use the first cost function in the multi-agent deep reinforcement learning MADRL algorithm to determine the cost value of the first computing task according to the second state, and the first cost function It includes an offload cost function and a local calculation cost function, where the offload cost function is used to determine the offload cost of the first calculation task, and the local calculation cost function is used to determine the local cost of the first calculation task; and iteratively update the device according to the MADRL algorithm When the MADRL algorithm reaches the termination condition, the processing unit 1300 is further configured to determine the first offloading decision of the first computing task according to the cost value of the first computing task.
  • the offloading overhead of the first computing task includes a first energy consumption overhead and a first delay overhead, where the first energy consumption overhead includes that the device offloads the first computing task to the first For the energy consumed by the edge, the first delay overhead includes the delay for the device to offload the first computing task to the first edge node and the delay for the first edge node to determine the calculation result of the first computing task.
  • the local overhead of the first computing task includes a second energy consumption overhead and a second delay overhead
  • the second energy consumption overhead includes the energy consumed by the device locally calculating the first computing task
  • the first delay overhead includes the delay of the device locally calculating the first computing task and the device switching from the sleep power to the first working power P 1
  • the first working power P 1 is the working power of the local computing task of the device.
  • the first offloading decision further includes a second working power
  • the second working power is the working power corresponding to the minimum cost value of the first calculation task when the MADRL algorithm reaches the termination condition.
  • the processing unit 1300 when the first offloading decision instructs the device to offload the first computing task to the first edge node for calculation, the processing unit 1300 is further configured to work with sleep power.
  • the processing unit 1300 is further configured to use a first parameter to dynamically adjust the first delay overhead, and the first parameter is used to indicate the technology of the processing unit 1300 and the first edge node to process the computing task difference.
  • the processing unit 1300 is further configured to use a second parameter to dynamically adjust the first energy consumption expenditure and the second energy consumption expenditure, and the second parameter is used to indicate the energy consumption expenditure of the processing unit 1300.
  • the degree of sensitivity is further configured to use a second parameter to dynamically adjust the first energy consumption expenditure and the second energy consumption expenditure, and the second parameter is used to indicate the energy consumption expenditure of the processing unit 1300.
  • the communication apparatus 1000 may be the first terminal device in the method embodiment.
  • the receiving unit 1100 may be a receiver
  • the sending unit 1200 may be a transmitter.
  • the receiver and transmitter can also be integrated into one transceiver.
  • the communication apparatus 1000 may be a chip or an integrated circuit in the first terminal device.
  • the receiving unit 1100 and the sending unit 1200 may be communication interfaces or interface circuits.
  • the receiving unit 1100 is an input interface or an input circuit
  • the sending unit 1200 is an output interface or an output circuit.
  • the processing unit 1300 may be a processing device.
  • the function of the processing device can be realized by hardware, or by hardware executing corresponding software.
  • the processing device may include at least one processor and at least one memory, wherein the at least one memory is used to store a computer program, and the at least one processor reads and executes the computer program stored in the at least one memory, so that The communication device 1000 executes operations and/or processing performed by the first terminal device in each method embodiment.
  • the processing device may only include a processor, and the memory for storing the computer program is located outside the processing device.
  • the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
  • the processing device may also be a chip or an integrated circuit.
  • FIG. 11 is a schematic block diagram of a communication device 2000 provided by this application.
  • the communication device 2000 includes a receiving unit 2100, a sending unit 2200, and a processing unit 2300.
  • the receiving unit 2100 is configured to receive the status of one or more tasks, the status of the one or more tasks includes the first status of the first computing task sent by the first terminal device, and the apparatus is one or more second terminals An edge node where the device provides computing resources, and the first terminal device is one of one or more second terminal devices;
  • the processing unit 2300 is configured to determine a second offloading decision according to one or more task states, where the second offloading decision includes computing resource allocation information of the processing unit 2300 to one or more second terminal devices;
  • the sending unit 2200 is configured to broadcast the second uninstallation decision to one or more second terminal devices.
  • the receiving unit 2100 and the sending unit 2200 can also be integrated into one transceiver unit, which has both receiving and sending functions, which is not limited here.
  • the receiving unit 2100 is further configured to receive the first calculation task sent by the first terminal device; the processing unit 2300 is further configured to determine the calculation result of the first calculation task; the sending unit 2200 is further configured to It is used to send the calculation result of the first calculation task to the first terminal device.
  • the processing unit 2300 is specifically configured to update the third state of the first edge node according to the state of one or more tasks to obtain the fourth state of the first edge node, where the third state is The first edge node receives the state before one or more task states; the processing unit 2300 is further configured to determine the cost value of the device according to the fourth state, and the cost value is the cost of the processing unit 2300 allocating computing resources for one or more computing tasks Cost: The processing unit 2300 determines the second offloading decision based on the cost value.
  • the processing unit 2300 is specifically configured to use the first cost function and the second cost function in the multi-agent deep reinforcement learning MADRL algorithm to determine the cost value according to the fourth state;
  • the first cost function Including offload cost function and local calculation cost function, where the offload cost function is used to determine the offload cost of one or more tasks, and the local calculation cost function is used to calculate the local cost of one or more tasks;
  • the second cost function includes the average cost Function and fair cost function, where the average cost function is used to determine the average cost of one or more tasks according to the offloading cost and local cost of one or more tasks, and the fair cost function is used to determine the average cost of one or more tasks according to the calculation resources of the device.
  • the number of terminal devices determines the fair cost of the device; the processing unit 2300 is specifically configured to determine the cost value of the device according to the average cost and fair cost of one or more tasks.
  • the processing unit 2300 is specifically configured to iteratively update the state of the first edge node and the cost value of the first edge node according to the MADRL algorithm; when the MADRL algorithm reaches the termination condition, the processing unit 2300 , Is also used to determine the second offloading decision based on the cost value of the first edge node.
  • the communication device 2000 may be the first edge node in the method embodiment.
  • the receiving unit 2100 may be a receiver, and the sending unit 2200 may be a transmitter.
  • the receiver and transmitter can also be integrated into one transceiver.
  • the communication device 2000 may be a chip or an integrated circuit in the first edge node.
  • the receiving unit 2100 and the sending unit 2200 may be communication interfaces or interface circuits.
  • the receiving unit 2100 is an input interface or an input circuit
  • the sending unit 2200 is an output interface or an output circuit.
  • the processing unit 1300 may be a processing device.
  • the function of the processing device can be realized by hardware, or by hardware executing corresponding software.
  • the processing device may include at least one processor and at least one memory, wherein the at least one memory is used to store a computer program, and the at least one processor reads and executes the computer program stored in the at least one memory, so that The communication device 2000 executes operations and/or processing performed by the first edge node in each method embodiment.
  • the processing device may only include a processor, and the memory for storing the computer program is located outside the processing device.
  • the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
  • the processing device may also be a chip or an integrated circuit.
  • the communication device 10 includes: one or more processors 11, one or more memories 12 and one or more communication interfaces 13.
  • the processor 11 is used to control the communication interface 13 to send and receive signals
  • the memory 12 is used to store a computer program
  • the processor 11 is used to call and run the computer program from the memory 12, so that the terminal device executes the Processes and/or operations are executed.
  • the processor 11 may have the function of the processing unit 1300 shown in FIG. 10, and the communication interface 13 may have the function of the sending unit 1100 and/or the receiving unit 1200 shown in FIG.
  • the processor 11 may be used to execute the processing or operation executed internally by the first terminal device in FIGS. 5 and 6, and the communication interface 13 may be used to execute the sending and/or operations executed by the first terminal device in FIGS. 5 and 6 Or the act of receiving.
  • the communication apparatus 10 may be the first terminal device in the method embodiment.
  • the communication interface 13 may be a transceiver.
  • the transceiver may include a receiver and a transmitter.
  • the processor 11 may be a baseband device, and the communication interface 13 may be a radio frequency device.
  • the communication device 10 may be a chip or an integrated circuit installed in the first terminal device.
  • the communication interface 13 may be an interface circuit or an input/output interface.
  • FIG. 13 is a schematic structural diagram of the communication device 20 provided by this application.
  • the communication device 20 includes: one or more processors 21, one or more memories 22 and one or more communication interfaces 23.
  • the processor 21 is used to control the communication interface 23 to send and receive signals, and the memory 22 is used to store computer programs. Processes and/or operations are executed.
  • the processor 21 may have the function of the processing unit 2300 shown in FIG. 11, and the communication interface 23 may have the function of the sending unit 2200 and/or the receiving unit 2100 shown in FIG. 11.
  • the processor 21 may be used to perform the processing or operation performed internally by the first edge node in FIG. 5 and FIG. 6, and the communication interface 33 may be used to perform the sending and/or operations performed by the first edge node in FIG. 5 and FIG. Or the act of receiving.
  • the communication device 20 may be the first edge node in the method embodiment.
  • the communication interface 23 may be a transceiver.
  • the transceiver may include a receiver and a transmitter.
  • the communication device 20 may be a chip or an integrated circuit installed in the first edge node.
  • the communication interface 23 may be an interface circuit or an input/output interface.
  • the memory and the processor in the foregoing device embodiments may be physically independent units, or the memory and the processor may be integrated together, which is not limited herein.
  • the present application also provides a computer-readable storage medium in which computer instructions are stored.
  • the method embodiments of the present application are executed by the first terminal device. The operations and/or processes are executed.
  • the present application also provides a computer-readable storage medium in which computer instructions are stored.
  • the operations performed by the first edge node in the method embodiments of the present application are And/or the process is executed.
  • the present application also provides a computer program product.
  • the computer program product includes computer program code or instructions.
  • the operations performed by the first terminal device in the method embodiments of the present application are And/or the process is executed.
  • the present application also provides a computer program product.
  • the computer program product includes computer program code or instructions.
  • the operations and/or operations performed by the first edge node in the method embodiments of the present application are Or the process is executed.
  • the present application also provides a chip including a processor.
  • the memory for storing the computer program is provided independently of the chip, and the processor is used to execute the computer program stored in the memory, so that the operation and/or processing performed by the first terminal device in any method embodiment is executed.
  • the chip may also include a communication interface.
  • the communication interface may be an input/output interface, or an interface circuit or the like.
  • the chip may also include the memory.
  • the application also provides a chip including a processor.
  • the memory for storing the computer program is provided independently of the chip, and the processor is used to execute the computer program stored in the memory, so that the operation and/or processing performed by the first edge node in any method embodiment is executed.
  • the chip may also include a communication interface.
  • the communication interface may be an input/output interface, or an interface circuit or the like.
  • the chip may also include the memory.
  • the present application also provides a communication device (for example, a chip), including a processor and a communication interface, the communication interface is used to receive a signal and transmit the signal to the processor, and the processor processes The signal causes the operation and/or processing performed by the first terminal device in any method embodiment to be executed.
  • a communication device for example, a chip
  • the communication interface is used to receive a signal and transmit the signal to the processor, and the processor processes The signal causes the operation and/or processing performed by the first terminal device in any method embodiment to be executed.
  • the present application also provides a communication device (for example, a chip), including a processor and a communication interface, where the communication interface is used to receive a signal and transmit the signal to the processor, and the processor processes the Signal so that operations and/or processing performed by the first edge node in any method embodiment are performed.
  • a communication device for example, a chip
  • the communication interface is used to receive a signal and transmit the signal to the processor, and the processor processes the Signal so that operations and/or processing performed by the first edge node in any method embodiment are performed.
  • the present application also provides a communication device, including at least one processor, the at least one processor is coupled with at least one memory, and the at least one processor is configured to execute a computer program or instruction stored in the at least one memory, The operation and/or processing performed by the first terminal device in any method embodiment is executed.
  • the present application also provides a communication device, including at least one processor, the at least one processor is coupled with at least one memory, and the at least one processor is configured to execute a computer program or instruction stored in the at least one memory, so that any The operation and/or processing performed by the first edge node in a method embodiment is performed.
  • this application also provides a first terminal device, including a processor, a memory, and a transceiver.
  • the memory is used to store computer programs
  • the processor is used to call and run the computer programs stored in the memory, and control the transceiver to send and receive signals, so that the first terminal device can perform any operation performed by the first terminal device in any method embodiment. And/or processing.
  • This application also provides a first edge node, including a processor, a memory, and a transceiver.
  • the memory is used to store computer programs
  • the processor is used to call and run the computer programs stored in the memory, and control the transceiver to send and receive signals, so that the first terminal device can perform operations performed by the first edge node in any method embodiment. And/or processing.
  • the present application also provides a wireless communication system, including the first terminal device and the first edge node in the embodiments of the present application.
  • the processor in the embodiment of the present application may be an integrated circuit chip, which has the ability to process signals.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic random access memory
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • a and/or B can mean that there is A alone, and both A and B exist. There are three cases of B. Among them, A, B, and C can all be singular or plural, and are not limited.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请提供了一种计算卸载的方法和装置,在网络边缘计算卸载中,边缘节点接收所服务的一个或多个终端设备发送的计算任务的状态,并根据接收到的一个或多个计算任务的状态确定计算资源的分配情况,然后边缘节点向所服务的终端设备广播计算资源的分配情况,终端设备根据该资源分配情况确定是否将计算任务卸载至边缘节点进行计算,从而使得边缘节点和终端设备能够在实际决策中对于环境有更广泛的感知能力,有效地提升两者的决策收益。

Description

计算卸载的方法和通信装置
本申请要求于2020年5月22日提交中国国家知识产权局、申请号为202010438782.2、申请名称为“计算卸载的方法和通信装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,具体涉及一种计算卸载的方法和通信装置。
背景技术
随着互联网方兴未艾,终端设备更新换代,新型应用不断涌现。与此同时,终端设备的计算能力逐渐难以满足用户业务的性能需求。尽管云计算的出现一定程度上缓解了终端的资源限制,然而在业务延迟、传输速率与数据安全等方面仍尚待提升,特别是对于诸如自动驾驶、实时视频监控等高实时高安全的应用上,这促使着云计算模式产生新的变革。
作为云计算的一种概念演进,边缘计算提出在网络边缘处提供计算与存储资源,具有延迟更低、能耗更低、节省带宽、隐私性强、更智能化等特点,并产生了诸如微云(Cloudlet)、雾计算(fog computing)、移动边缘计算(mobile edge computing)、多接入边缘计算(multi-access edge computing,MEC)等衍生概念。作为实现物联网与第五代移动通信技术(5th generation,5G)愿景的关键技术,边缘计算(及其衍生概念)处于计算模式与网络通信的交叉点,对满足种类繁多的业务需求与精益求精的用户体验具有不可替代的作用。
与云计算中的计算卸载方法类似,边缘计算是指终端设备将计算密集的任务卸载(offload)到网络边缘节点进行计算,并接受对应的任务的运行结果以达到计算卸载的目的。具体来说,面向边缘计算有两方面需要具体考虑:一方面,终端设备存在着运算能力较低、能量敏感度高、支持功能较少等不足;另一方面,网络边缘节点也暴露出计算资源分散、供电方式不同、系统架构多样等现象。因此,如何更好地管理与分配边缘节点的计算能力成为亟需关注的重要问题。
发明内容
本申请提供一种计算卸载的方法和通信装置,能够使边缘节点和终端设备在实际决策中对于环境有更广泛的感知能力,从而有效地提升两者的决策收益。
第一方面,提供了一种方法,该方法包括:第一终端设备向第一边缘节点发送第一计算任务的第一状态,其中,第一边缘节点为第一终端设备获取计算资源的边缘节点,所述第一状态包括传输所述第一计算任务的数据流的长度、计算所述第一计算任务所需耗费的时钟周期数、所述第一计算任务的惩罚值中的至少一个;第一终端设备接收第一边缘节点发送的第二卸载决策,第二卸载决策是由第一状态确定的,第二卸载决策包括一个或多个 第二终端设备的计算资源分配信息,第二终端设备为从第一边缘节点获取计算资源的终端设备且第一终端设备为一个或多个第二终端设备中的一个终端设备;第一终端设备根据第二卸载决策确定第一计算任务的第一卸载决策,第一卸载决策用于指示第一终端设备是否卸载第一计算任务至第一边缘节点进行计算。
上述技术方案中,终端设备向边缘节点发送计算任务,边缘节点根据接收到的一个或多个计算任务的状态确定计算资源的分配情况,然后边缘节点向所服务的终端设备发送计算资源的分配情况,终端设备根据该资源分配情况确定是否将计算任务卸载至边缘节点进行计算,能够使终端设备能够在实际的决策中对于环境有更广泛的感知能力,从而有效地提升两者的决策收益。
结合第一方面,在第一方面的某些实现方式中,当第一卸载决策指示第一终端设备卸载第一计算任务至第一边缘节点进行计算时,第一终端设备向第一边缘节点发送第一计算任务;第一终端设备接收第一边缘节点发送的第一计算任务的计算结果;或者当第一卸载决策指示第一终端设备不卸载所述第一计算任务时,第一终端设备在本地确定第一计算任务的计算结果。
结合第一方面,在第一方面的某些实现方式中,第一终端设备根据第二卸载决策确定第一计算任务的第一卸载决策,包括:第一终端设备根据第二卸载决策,更新第一计算任务的第一状态中的参数得到第一计算任务的第二状态;第一终端设备根据第二状态计算第一计算任务的代价值,第一计算任务的代价值包括第一计算任务的本地开销和卸载开销;第一终端设备根据第一计算任务的代价值确定第一计算任务的第一卸载决策。
上述技术方案中,终端设备根据边缘节点的资源分配情况确定否将计算任务卸载至边缘节点进行计算,能够使终端设备能够在实际的决策中对于环境有更广泛的感知能力,从而有效地提升两者的决策收益。
结合第一方面,在第一方面的某些实现方式中,第一终端设备根据第二状态计算第一计算任务的代价值,包括:第一终端设备根据第二状态,使用多智能体深度强化学习MADRL算法中的第一代价函数确定第一计算任务的代价值,第一代价函数包括卸载开销函数和本地计算开销函数,其中,卸载开销函数用于确定第一计算任务的卸载开销,本地计算开销函数用于确定第一计算任务的本地开销;以及第一终端设备根据第一计算任务的代价值确定第一计算任务的第一卸载决策,包括:第一终端设备根据MADRL算法迭代更新第一终端设备的第一计算任务的状态和第一计算任务的代价值;当MADRL算法达到终止条件时,第一终端设备根据第一计算任务的最小代价值确定第一计算任务的第一卸载决策。
上述技术方案中,终端设备根据边缘节点的资源分配情况使用MADRL算法确定否将计算任务卸载至边缘节点进行计算,能够使终端设备能够在实际的决策中对于环境有更广泛的感知能力,从而有效地提升两者的决策收益。
结合第一方面,在第一方面的某些实现方式中,第一计算任务的卸载开销包括第一能耗开销和第一时延开销,其中,第一能耗开销包括第一终端设备将第一计算任务卸载至第一边缘消耗的能量,第一时延开销包括第一终端设备将第一计算任务卸载至第一边缘节点的时延以及第一边缘节点确定第一计算任务的计算结果的时延。
结合第一方面,在第一方面的某些实现方式中,第一计算任务的本地开销包括第二能 耗开销和第二时延开销,其中,第二能耗开销包括第一终端设备本地计算第一计算任务消耗的能量和第一终端设备从休眠功率P sleep切换到第一工作功率P 1消耗的能量,第一时延开销包括第一终端设备本地计算第一计算任务的时延和第一终端设备从休眠功率切换到第一工作功率P 1的时延,第一工作功率P 1为第一终端设备本地计算任务的工作功率。
上述技术方案中,在终端设备本地处理计算任务的代价值计算中,引入功率控制因素,在终端设备代价值计算中增加了功率切换时的代价损失,这有助于更为精细地描述终端设备的计算卸载中的决策内容,进而有效地提高用户体验、减少模型误差。
结合第一方面,在第一方面的某些实现方式中,第一卸载决策还包括第二工作功率,第二工作功率为MADRL算法达到终止条件时,第一计算任务的最小代价值对应的工作功率。
结合第一方面,在第一方面的某些实现方式中,当第一卸载决策指示第一终端设备卸载第一计算任务至第一边缘节点进行计算时,第一终端设备以休眠功率工作。
结合第一方面,在第一方面的某些实现方式中,方法还包括:第一终端设备使用第一参数对第一时延开销进行动态调节,第一参数用于表示第一终端设备与第一边缘节点处理计算任务的技术差异。
上述技术方案中,引入技术差异参数,能够更为精细地描述计算卸载中的决策内容。
结合第一方面,在第一方面的某些实现方式中,方法还包括:第一终端设备使用第二参数对第一能耗开销和第二能耗开销进行动态调节,第二参数用于表示第一终端设备对于能耗开销的敏感程度。
上述技术方案中,引入能量权衡参数,能够更为精细地描述计算卸载中的决策内容。
第二方面,提供了一种方法,该方法包括:第一边缘节点接收一个或多个任务的状态,一个或多个任务的状态包括第一终端设备发送的第一计算任务的第一状态,第一边缘节点是为一个或多个第二终端设备提供计算资源的边缘节点,且第一终端设备为一个或多个第二终端设备中的一个终端设备;第一边缘节点根据一个或多个任务状态,确定第二卸载决策,第二卸载决策包括第一边缘节点对一个或多个第二终端设备的计算资源分配信息;第一边缘节点向一个或多个第二终端设备广播第二卸载决策。
上述技术方案中,边缘节点接收所服务的一个或多个终端设备发送的计算任务的状态,并根据接收到的一个或多个计算任务的状态确定计算资源的分配情况,然后边缘节点向所服务的终端设备广播计算资源的分配情况,终端设备根据该资源分配情况确定是否将计算任务卸载至边缘节点进行计算,从而使得边缘节点和终端设备能够在实际决策中对于环境有更广泛的感知能力,从而有效地提升两者的决策收益。
结合第二方面,在第二方面的某些实现方式中,第一边缘节点接收第一终端设备发送的第一计算任务;第一边缘节点确定第一计算任务的计算结果;第一边缘节点向第一终端设备发送第一计算任务的计算结果。
结合第二方面,在第二方面的某些实现方式中,第一边缘节点根据一个或多个任务的状态确定第二卸载决策,包括:第一边缘节点根据一个或多个任务的状态更新第一边缘节点的第三状态得到第一边缘节点的第四状态,其中,第三状态是第一边缘节点接收到一个或多个任务状态之前的状态;第一边缘节点根据第四状态确定第一边缘节点的代价值,第一边缘节点的代价值为第一边缘节点为一个或多个计算任务分配计算资源的开销;第一边 缘节点根据第一边缘节点的代价值确定第二卸载决策。
结合第二方面,在第二方面的某些实现方式中,第一边缘节点根据第四状态确定第一边缘节点的代价值,包括:第一边缘节点根据第四状态,使用多智能体深度强化学习MADRL算法中的第一代价函数和第二代价函数确定第一边缘节点的代价值;第一代价函数包括卸载开销函数和本地计算开销函数,其中,卸载开销函数用于确定一个或多个任务的卸载开销,本地计算开销函数用于计算一个或多个任务的本地开销;第二代价函数包括平均代价函数和公平代价函数,其中,平均代价函数用于根据一个或多个任务的卸载开销和本地开销确定一个或多个任务的平均开销,公平代价函数用于根据使用第一边缘节点计算资源的第二终端设备的数量确定第一边缘节点的公平代价;第一边缘节点根据一个或多个任务的平均开销与第一边缘节点的公平代价确定第一边缘节点的代价值。
上述技术方案中,边缘节点的代价值权衡了多种因素,边缘节点决策项保证了所服务的终端设备的平均体验,同时提升了终端设备在资源利用上的公平性,即在保证资源高效分配的同时也避免了服务用户的数量过少,使边缘节点的代价值反映的现实环境更为全面。
结合第二方面,在第二方面的某些实现方式中,第一边缘节点根据第一边缘节点的代价值确定第二卸载决策,包括:第一边缘节点根据MADRL算法迭代更新第一边缘节点的状态和第一边缘节点的代价值;在MADRL算法达到终止条件的情况下,第一边缘节点根据第一边缘节点的最小代价值确定第二卸载决策。
第三方面,本申请提供了一种通信装置,所述通信装置具有实现第一方面或其任意可能的实现方式中的方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元。
第四方面,本申请提供了一种通信装置,所述通信装置具有实现第二方面或其任意可能的实现方式中的方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元。
第五方面,本申请提供一种通信装置,包括接口电路和处理器,所述接口电路用于接收计算机代码或指令,并传输至所述处理器,所述处理器运行所述计算机代码或指令,第一方面或其任意实现方式中的方法被实现。
第六方面,本申请提供一种通信装置,包括接口电路和处理器,所述接口电路用于接收计算机代码或指令,并传输至所述处理器,所述处理器运行所述计算机代码或指令,第二方面或其任意实现方式中的方法被实现。
第七方面,本申请提供一种通信设备,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个存储器用于存储计算机程序或指令,所述至少一个处理器用于从所述至少一个存储器中调用并运行该计算机程序或指令,使得通信设备执行第一方面或其任意可能的实现方式中的方法。
在一个示例中,所述通信设备可以为终端设备。
第八方面,本申请提供一种通信设备,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个存储器用于存储计算机程序或指令,所述至少一个处理器用于从所述至少一个存储器中调用并运行该计算机程序或指令,使得通信设备执行第二方面或其任意可能的实现方式中的方法。
在一个示例中,所述通信设备可以为边缘节点。
第九方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,所述第一方面或其任意可能的实现方式中的方法被实现。
第十方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,所述第二方面或其任意可能的实现方式中的方法被实现。
第十一方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,所述第一方面或其任意可能的实现方式中的方法被实现。
第十二方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,所述第二方面或其任意可能的实现方式中的方法被实现。
第十三方面,本申请提供一种无线通信系统,包括如第七方面的通信设备和第八方面所述通信设备。
附图说明
图1示出了适用于本申请实施例的系统架构的示意图。
图2是强化学习的训练过程示意图。
图3是一种基于DRL的计算卸载方法的示意图。
图4是MADRL中智能体与环境的交互过程的示意图。
图5为本申请提供的计算卸载的方法的示意性流程图。
图6为本申请提供的计算卸载的方法的示意性流程图。
图7是本申请提供的基于MADDPG的数据采集的示意性流程图。
图8是本申请提供的基于MADDPG的参数模型训练的示意性流程图。
图9是本申请提供的基于MADDPG的计算卸载的方法的示意性流程图。
图10为本申请提供的通信装置1000的示意性框图。
图11为本申请提供的通信装置2000的示意性框图。
图12为本申请提供的通信装置10的示意性结构图。
图13为本申请提供的通信装置20的示意性结构图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、通用移动通信系统(universal mobile telecommunication system,UMTS)、新无线(new radio,NR)系统等第五代(5th generation,5G)系统,卫星通信系统,以及其它未来演进的通信系统,车到其它设备(vehicle-to-X V2X),其中V2X可以包括车到互联网(vehicle to network,V2N)、车到车(vehicle to-vehicle,V2V)、 车到基础设施(vehicle to infrastructure,V2I)、车到行人(vehicle to pedestrian,V2P)等、车间通信长期演进技术(long term evolution-vehicle,LTE-V)、车联网、机器类通信(machine type communication,MTC)、物联网(internet of things,IoT)、机器间通信长期演进技术(long term evolution-machine,LTE-M),机器到机器(machine to machine,M2M)等。
参见图1,图1示出了适用于本申请实施例的系统架构的示意图。如图1所示,该系统架构中包括至少一个终端设备、至少一个无线接入节点和至少一个边缘节点。终端设备通过多种无线接入方式(包括但不限于蜂窝与Wi-Fi)与对应的无线接入点,访问到位于互联网边缘(即网络边缘)节点组成的资源池(即边缘节点构成的资源池)。
边缘节点,即表示在靠近终端设备的网络边缘侧构建的业务平台,提供存储、计算、网络等资源,将部分关键业务应用下沉到接入网络边缘,以减少网络传输和多级转发带来的宽度和时延损耗。本申请将网络边缘中具有计算能力的设备统一抽象为边缘节点。但本申请本身不涉及边缘节点内计算资源的抽象、管理与分配功能,仅指出边缘节点以统一的整体向终端设备分配计算资源的策略。
应理解,图1所示的系统架构中的终端设备、边缘节点与无线信道,根据现实情况有如下规定:在每个时刻,单个终端设备仅能接入一条无线信道,并通过该信道获取到一个边缘节点的计算资源;单个无线接入点能够以此访问到其中的一个(或多个)边缘节点,并固定提供一条无线信道给终端设备接入;单个边缘节点能够同时利用多个无线接入点向终端设备提供计算资源,并可以同时为多个终端设备提供计算资源。为了方便下文详细说明,有如下定义:终端设备D、边缘节点N、无线信道Z可分别表示为D={1,2,…,d},i∈D,N={1,2,…,n},j∈N,Z={1,2,…,z},k∈Z;终端设备连接到的边缘节点与通信的无线信道分别为:conn i∈{0}∪N和link i∈{0}∪Z。
本申请实施例中的终端设备也可以称为:用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置等。
终端设备可以是一种向用户提供语音/数据连通性的设备,例如,具有无线连接功能的手持式设备、车载设备等。目前,一些终端设备的举例为:手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字助理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、可穿戴设备,未来6G网络中的终端设备或者未来演进的公用陆地移动通信网络(public land mobile network,PLMN)中的终端设备和/或用于在无线通信系统上通信的任意其它适合设备,本申请实施例对此并不限定。
其中,可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。
此外,在本申请实施例中,终端设备还可以是物联网系统中的终端设备,IoT是未来信息技术发展的重要组成部分,其主要技术特点是将物品通过通信技术与网络连接,从而实现人机互连,物物互连的智能化网络。
此外,在本申请实施例中,终端设备还可以包括智能打印机、火车探测器、加油站等传感器,主要功能包括收集数据(部分终端设备)、接收网络设备的控制信息与下行数据,并发送电磁波,向网络设备传输上行数据。
为便于理解本申请实施例,首先对本申请中涉及到的术语作简单说明。
1、边缘计算:从“云计算”这一概念中演进而出,是一种新型的分布式计算模式。具体定义为:在互联网边缘向用户提供计算与存储资源。其中,“边缘”被定义为终端设备与云数据中心之间路径上的任何网络位置,这些位置相比云数据中心更靠近于用户。一般来说,“边缘计算”是指无线链路与有线网络的交界附近,即在无线接入点附近部署资源。边缘计算具有延迟更低、能耗更低、节省带宽、隐私性强、更智能化等特点,作为实现物联网与5G愿景的关键使能技术,边缘计算(及其衍生概念)处于计算模式与网络通信的交叉点,对满足种类繁多的业务需求与精益求精的用户体验具有不可替代的作用。
2、计算卸载:是指计算设备将资源密集型的计算任务转移到单独的处理器或其他设备的行为。在本申请中,计算设备即是指终端设备,资源密集型的计算任务是是需要花费一定计算才能完成的任务,而转移的位置则是网络边缘处的边缘节点。具体地,其典型过程为:终端设备将计算任务卸载到边缘节点,边缘节点对计算任务进行处理,终端设备接收边缘节点相应任务的计算结果。
与云计算中的计算卸载方法类似,具体的,可能的一种利用边缘计算提升终端性能的方式即是:终端设备将计算密集的任务卸载(offload)到网络边缘执行,并接受对应的运行结果以达到计算卸载的目的。具体来说,面向边缘计算两方面需要具体考虑:一方面,终端设备存在着运算能力较低、能量敏感度高、支持功能较少等不足;另一方面,网络边缘也暴露出计算资源分散、供电方式不同、系统架构多样等现象。因此,如何管理与分配网络边缘的计算能力成为亟需关注的重要问题。
3、深度强化学习(deep reinforcement learning,DRL):强化学习是机器学习中的一个领域。参见图2,图2是强化学习的训练过程示意图。如图2所示,强化学习主要包含四个元素:智能体(agent)、环境(environment)、状态(state)、动作(action)与奖励(reward),其中,智能体的输入为状态,输出为动作。当前技术中,强化学习的训练过程为:通过智能体与环境进行多次交互,获得每次交互的动作、状态、奖励;将这多组(动作,状态,奖励)作为训练数据,对智能体进行一次训练。采用上述过程,对智能体进行下一轮次训练,直至满足收敛条件。其中,获得一次交互的动作、状态、奖励的过程 如图1所示,将环境当前状态s(t)输入至智能体,获得智能体输出的动作a(t),根据环境在动作a(t)作用下的相关性能指标计算本次交互的奖励r(t),至此,获得本次交互的动作a(t)、动作a(t)与奖励r(t)。记录本次交互的动作a(t)、动作a(t)与奖励r(t),以备后续用来训练智能体。还记录环境在动作a(t)作用下的下一个状态s(t+1),以便实现智能体与环境的下一次交互。将强化学习和深度学习相结合,就得到了深度强化学习。深度强化学习仍然符合强化学习中主体和环境交互的框架。不同的是,智能体(agent)中使用深度神经网络进行决策。深度强化学习是一种综合了深度学习与强化学习的人工智能方法,在诸如动态决策、实时控制、图像感知等复杂问题中具有广泛的应用,并产生了一些基于DRL的计算卸载方法。其中,针对计算卸载的已有方法中涉及的算法包括且不限于:深度Q网络(deepQ-learning,DQN)、深度确定性策略梯度(deep deterministic policy gradient,DDPG)、Actor-Critic(演员-评委)算法。基于DRL的方法模型中主要包括智能体和与其交互的环境。其中,终端设备被视作决策与执行的实体,其采用深度神经网络来感知环境的当前状态,并通过类似强化学习的方式与环境互动并调整策略,最后经过迭代多次的感知-调整过程逐渐向最优策略靠拢。
4、多智能体深度强化学习(multi-agent deep reinforcement learning,MADRL):是一种融合了多智能体系统中深度学习与强化学习的方法,其中,深度学习能够利用深度神经网络从信息中有效地提取特征,强化学习通过与环境的动态交互以不断地强化决策能力,多智能体系统则强调智能体在关注环境因素的同时也关注智能体之间决策的相互影响。因此,MADRL适用于描述复杂环境下多个智能体决策与交互的过程,在诸如机器人协作、分布式控制、协同决策等诸多领域有着较为广泛的应用。具体地,MADRL针对特定问题构建了一个包含多个智能体的系统,其中每个智能体都使用深度强化学习(deep reinforcement learning,DRL)方法描述其决策与交互过程。相比于通常的(单智能体)深度强化学习,多智能体系统中每个智能体的最优解不仅受到环境变量的制约,也同时受到其它智能体行为的约束与影响。
5、多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法:是一种典型的多智能体深度强化学习算法,是深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法在多智能体中的延伸与拓展,其中每个智能体分别运行一个DDPG模型。MADDPG算法通过集中式训练、分布式执行的方法实现多智能体的决策过程,在训练时需要知晓其他智能体的决策信息,而执行时仅需局部信息即可做出决策。
6、计算功率控制:是指终端设备在处理计算任务的过程中调整计算功率以权衡业务延迟与能量耗费的行为。由于边缘计算中终端设备是能量受限的,并且在能耗敏感程度上具有差异,因此控制本地处理任务的计算功率能够有效提升用户体验。
本申请中边缘计算卸载问题中的决策实体包括两种:终端设备(计算卸载方)与边缘节点(资源提供方)。其中,终端设备在生成计算密集的任务时将决定是在终端设备本地处理该任务(即不执行计算卸载),还是将计算任务打包通过网络传输给边缘节点,并接收边缘节点处理任务后返回的结果。边缘节点则需要根据当前的环境情况与终端状态,动态地统计当前可用资源并将计算资源分配给需要的终端设备。资源受限的终端设备需要根据无线环境、计算能力与能量模式等当前状态,确定终端设备中待处理的计算任务与可用 的计算资源之间的对应关系,即决策在何处、以怎样的方式执行哪个计算任务。一方面,本地执行避免了在网络中传输任务而造成的开销;另一方面,计算卸载则减少了执行计算任务的时间与能耗。
参见图3,图3是一种基于DRL的计算卸载方法的示意图。
以Actor-Critic算法为例,在一个无线基站的覆盖范围中存在着多个终端设备,并且每个终端设备都存在着一个计算密集且延迟敏感的任务需要完成。终端设备可以选择将计算任务在本地进行处理,也可以选择通过基站将计算任务卸载到远端。由于计算卸载过程中的无线传输速率受到背景噪声与其它终端设备的通信干扰,因此终端设备需要权衡本机处理与计算卸载的效益大小进行决策。这里简单的介绍一下该Actor-Critic算法的计算流程:当终端设备运行Actor-Critic算法时,将会同时维护两个神经网络Actor与Critic,并分别对应着两套策略参数。其中,Actor的输入为当前状态,输出为当前状态下能够获得最高奖励的策略动作,Critic的输入则为当前状态、环境返回的下一个状态与奖励,输出为环境返回奖励与自身估计的奖励之间的误差(TD_error)。
结合图3,具体地,Actor-Critic算法的训练过程如下所述:(1)Actor首先根据环境当前状态S决策行动A。(2)环境根据状态S与行动A返回下一个状态S′与奖励R。(3)Critic根据当前状态S、下一个状态S′与奖励R得到估计误差TD_error,并根据该误差调整自身的策略参数;(4)Actor根据状态S、行动A与误差TD_error也调整自身的策略参数;(5)终端将当前状态S赋值为下一个状态S′,即S=S′,重复步骤(1)到(4)的过程继续训练。相对应地,基于Actor-Critic的求解方法在具体执行时,仅需使用Actor即可根据环境当前状态S决定是否卸载的动作A。
上述单智能体DRL假设决策环境是稳定的,不会因为决策本身而发生改变,这在多个终端设备同时决策是否计算卸载时是难以成立的。具体地,是否计算卸载这一决策对应着终端设备在无线信道与外部计算资源上的竞争关系,同时会影响到决策奖励。因此,对于每个终端设备来说,自身的决策过程都可能会导致其它终端设备的决策变化,进而引发DRL中环境的动态变化。而当DRL对于环境稳定的假设不成立时,算法收敛将会不再稳定,进而造成求解方法的性能退化。
此外,基于DRL的已有方法着重考虑了智能体与环境间的交互过程,而较少关注智能体与智能体之间的交互机制与信息内容,这使得智能体难以合理地从环境中获取到更多的有效信息,进而限制决策收益的进一步提升。
参见图4,图4是MADRL中智能体与环境的交互过程的示意图。与单智能体DRL类似,MADRL中的每个智能体仍旧采用了一套DRL的方法,即包含着一个感知与控制的系统,并由五个部分组成:智能体、决策动作、状态、奖励与环境。智能体是系统中的决策实体,图4中的输入S′是反映当前环境对于智能体的一个状态,输出A是给定状态下该智能体所执行的动作。所有的智能体共享着同一个环境,但动作、状态与奖励往往不同。环境用于与多个智能体进行交互,维护着所有智能体的当前状态集合,向智能体分别展示相应的当前状态,并根据所有智能体的动作产生对应的奖励与下一个状态集合。
MADRL的训练过程即是求取给定环境中智能体最优策略的过程,该过程通过上述多个智能体与环境的不断交互来实现:每个智能体从环境中获取到一个高维度的观测状态S,利用类似深度学习(deep learning,DL)的方法来感知并分析该观测状态,随后根据预期 回报来评价各个可行动作,并根据策略在当前状态下做出决策动作,最后环境对所有智能体的决策动作统一做出反应,从而得到下一个状态集合S′与相应的奖励集合R。通过不断循环以上过程,多智能体最终可以得到给定环境下的最优策略。
特别地,与单智能体DRL的不同之处在于,MADRL中的策略学习是一种集中式的过程。每个智能体在决策时都能够知晓所有智能体对于当前观测状态的策略,并可以获知所有智能体的决策奖励集合R与下一个状态集合,这样就能够避免学习过程中智能体决策相互影响造成的非平稳性。在MADRL的策略推断过程中,每个智能体根据自身对于环境的观测状态独立地做出决策。以上所述的策略学习与推断过程是一种类似于群体行为的人工智能方法。
为了使智能体(即终端设备)能从环境中获取到更多的有效信息,更好地决策是否卸载计算任务这一基础决策,本申请提供一种基于多智能体深度强化学习的计算卸载的方法,具体的,通过优化并权衡业务延迟与能量耗费这两个主要指标,从而能够进一步提升计算卸载任务的决策收益,达到增强用户体验与提高资源效用的目的。
由上可知,在终端设备执行计算卸载的过程中,终端设备的收益由两个阶段的性能表现所构成:网络传输与任务计算,其中,任务计算的消耗时间受到边缘节点的资源分配所影响,而网络传输的处理代价则由无线链路传输与有线链路传输两部分构成。考虑到端-边(终端设备-边缘节点)交互中的无线网络环境更为复杂,受到环境影响与传输干扰更为严重,因此本申请关注无线链路中的环境噪声与带宽争用现象。其中,环境噪声的范围包括背景噪声与不参与计算卸载决策的其它设备在该信道上造成的通信干扰,带宽争用的范围包括所有使用该信道并参与计算卸载决策的终端设备相互的通信干扰。此外,由于当前的无线环境设计与部署时已经充分考虑到了接入点信号的相互干扰,因此本申请默认接入点的部署是合理的,这使得终端设备彼此之间的通信干扰成为影响网络传输速率的主要因素。在本申请的典型场景中,计算卸载过程的处理时间通常在秒级及以下,在这种时间粒度中终端设备的移动对于边缘节点与无线接入点的连接状态是可以预期的,因此本申请在考虑计算卸载决策时默认计算卸载的决策与执行过程中终端设备与边缘节点、终端设备与无线接入点的链路状态不发生改变。
参见图5,图5为本申请提供的计算卸载方法的示意性流程图。
S510,第一终端设备向第一边缘节点发送第一计算任务的第一状态。
对应的,第一边缘节点接收第一终端设备发送的第一状态。
需要说明的是,由于一个边缘节点往往会服务一个或多个的终端设备,所以第一边缘节点除了收到第一终端设备发送第一状态信息,还可能会接收到其他服务的终端设备发送的计算任务的状态。例如:第一边缘节点为边缘节点j,那么边缘节点j服务的终端设备可以表示为:i∈D且conn i=j(即第二终端设备的一例)。
本申请中,任意一个终端设备的计算任务的状态可以包括但不限于:计算任务的数据流长度、计算任务需耗费的时间、计算任务的惩罚值、以及其他计算任务相关的参数等。
作为示例而非限定,本申请中使用以下所述的方式定义计算任务的状态。每个特定的计算任务状态能够表示为:M={mt,mc,md,mp}。其中,mt代表传输计算任务时所需的数据流长度,单位为字节;mc代表处理该计算任务所需要耗费的时钟周期数,单位为赫兹;md代表计算任务从开始执行到得到计算结果终端设备所能接受的最长时间,md是 一个与任务相关的固定值,在任务产生的时候由任务产生的应用给出该固定值,单位为秒;mp代表当无法成功处理该计算任务时造成的惩罚值,是终端设备与边缘节点计算收益时的一部分。例如:第一计算任务的第一状态可表示为M 1={mt 1,mc 1,md 1,mp 1}。
第一边缘节点根据接收到的一个或多个计算任务的状态确定第二卸载决策,第二卸载决策包括第一边缘节点对所服务的终端设备的计算资源的分配信息。
可选的,第一边缘节点可以根据接收到的一个或多个的计算任务的状态使用MADRL算法确定第二卸载决策。关于MADRL算法具体的决策过程这里暂不展开叙述,在图6对应的流程图中会做详细的介绍。
S520,第一终端设备接收第一边缘节点发送的第二卸载决策。
对应的,第一边缘节点向第一终端设备发送第二卸载决策。
S530,第一终端设备根据第二卸载决策确定第一计算任务的第一卸载决策。
其中,第一卸载决策用于指示所述第一终端设备在本地计算第一计算任务或者第一终端设备卸载第一计算任务至第一边缘节点进行计算。
可选的,第一卸载决策还可以包括第二工作功率。第二工作功率为第一终端设备根据MADRL算法确定的在本地计算第一计算任务的工作功率。
需要说明的是,当第一终端设备卸载第一计算任务至第一边缘节点进行计算时,终端设备的工作功率为第一终端设备的休眠功率。
可选的,第一终端设备可以根据接收到的第二卸载决策使用MADRL算法确定第一计算任务的第一卸载决策。关于MADRL算法具体的决策过程这里暂不展开叙述,在图6对应的流程图中会做详细的介绍。
上述技术方案中,描述了计算卸载过程中终端设备与边缘节点的交互过程与交互信息的具体内容,终端设备和边缘节点通过使得双方能够在实际决策中对于环境有更广泛的感知能力,这能够有效地提升以上两者的决策收益,同时信息交互过程简单、易于实现。
参见图6,图6为本申请提供的计算卸载方法的示意性流程图。在该流程中终端设备与边缘节点的决策过程是一个闭环流程,其静止状态在于“等待”步骤,并在每个时间片开始时触发执行流程。
本申请中的决策实体包括两种,为一个或多个终端设备与一个或多个边缘节点,以终端设备i和边缘节点j为例进行说明,该流程实施过程主要包括终端设备i和边缘节点j基于MADRL算法离线训练确定计算卸载的决策过程与执行过程,其中,边缘节点j为终端设备i提供计算资源,即conn i=j。
应理解,在本申请中,MADRL算法的离线训练过程可以执行在任何位置,仅需要将训练好的边缘节点参数模型部署在对应的边缘节点,并将训练好的一个或多个终端设备的参数模型部署在每一个终端设备即可,从而实现基于MADRL算法的计算卸载方法。
终端设备i计算卸载过程如下:
S601,终端设备i(即第一终端设备的一例)的决策过程处于等待状态,当每个时间片开始时执行S602。
该方法中基于MADRL算法的时间建模方式与其他相关方法中对于时间的建模方式相同,本实施例中将连续的时间划分为长度为δ秒且不相交的时间片(δ>0)。实际应用中可以根据待卸载业务的具体情况来确定时间片的长度δ,本申请对此不做具体限定。
S602,终端设备i判断是否有任务待处理,如果没有,跳转至S601,否则执行S603。
S603,终端设备i判断当前是否正在处理计算任务。
终端设备i处理计算任务的执行过程,是一个在决策过程中开启新线程,在新的线程中以决策结果执行计算任务的流程,应理解,终端设备i在同一时间片处理一个计算任务,并且处理单个计算任务的过程可能持续数个时间片(即处理回合)。假设任务产生与执行中不考虑队列机制对于决策动作的影响,即如果终端设备在处理计算任务的过程中产生了新的计算任务事件,则会忽略新任务并产生对应的任务惩罚值。与此类似,如果终端设备在决策动作时发现任务执行时长超过限制,也将忽略该任务并产生惩罚值。关于任务的惩罚值参见S510中对mp的定义。
具体的,终端设备处理单个计算任务的时间片状态忙的回合数(即处理回合)可通过以下方式计算得到:
Figure PCTCN2021088860-appb-000001
其中,处理时间T是指从终端设备从决策如何执行所在的时间片开始,截止到任务执行完成的时长,
Figure PCTCN2021088860-appb-000002
为天花板函数,即表示向上取整,关于本地处理任务和卸载计算任务的时长计算会在后文中的【收益指标】中做详细的介绍,这里暂不展开叙述。
终端设备i在每个时间片先判断是否有待处理的任务(S602),如果没有新任务且当前的round i不等于0,则在每个时间片结束的时候执行round i=max(round i-1,0)。如果有新任务,例如:新任务为计算任务X,则终端设备i继续判断当前的round i是否等于0,当不等于0时,表示系统当前正在处理任务,则忽略待处理的计算任务X并产生对应的任务惩罚值,跳转至S601,如果等于0时,表示已经系统当前没有任务在执行,可以执行计算任务X,跳转至S604。
S604,终端设备i向边缘节点j(即第一边缘节点的一例)发送任务摘要。
对应的,在S703中边缘节点j接收终端设备i发送的任务摘要。
该任务摘要包括待处理计算任务X的状态M i(即第一计算任务的第一状态的一例),当终端设备i确定发送成功或等待超过一段时间后执行S605。
计算任务X的状态M i可表示为M i={mt i,mc i,md i,mp i}。例如:在S602中,当终端设备i在时间片开始判断自身没有待处理的计算任务时,可表示为M 0={0,0,0,0}。
可选的,该任务摘要还包括当前信道噪声w i与当前的信道增益H i。例如:终端设备i可以以网络报文的形式将上述信息发送给边缘节点j,一种具体的消息内容形式为
Figure PCTCN2021088860-appb-000003
S605,终端设备i接收边缘节点j广播的状态信息。
对应的,在S706中边缘节点j通过连接的无线接入点向所服务的终端设备广播状态信息。
该状态信息包括边缘节点j为所服务的终端设备的资源分配策略(即第二卸载决策的一例)。关于边缘节点j广播的状态信息的具体内容参见S705中描述,这里暂不介绍。例如:边缘节点j一种具体的状态内容可以为
Figure PCTCN2021088860-appb-000004
这里边缘节点的决策序号d+j是为了与终端设备的决策序号i相区分。其中,a (j,i)∈{0,1,…,a (j,avail)}表示边缘节点j分配给终端设备i的计算资源数量,0表示不分配计算资源,a (j,avail)表示分配当前所有可用资源。
S606,终端设备i根据边缘节点j广播的状态信息与自身的静态状态(包括终端设备i对于环境的部分观测)使用MADRL算法确定行动决策(即第一卸载决策的一例)。
当时间片开始时,终端设备i自身的静态状态X i与对应的静态状态空间如下所示,需要说明的是,该静态状态并非多智能体(即多个终端设备)在深度强化学习算法中做出决策时的状态。
终端设备i在时间片开始时自身的静态状态可以为X i=(M i,conn i,link i,w i,round i)。终端设备对应的静态状态空间是指终端设备和边缘节点在不交互状态信息时,对于问题环境的部分观测所形成的状态空间,即终端设备i对应的静态状态空间可表示为
Figure PCTCN2021088860-appb-000005
,其中,M 0表示无任务,M i_1,M i_2,…,M i_max为终端设备i所有可能产生的计算任务类型。0,w i_1,w i_2,…,w i_max为离散化的环境干扰程度,具体可根据环境干扰的波动范围确定粒度。MD i=max(md i_1,md i_2,…,md i_max),即取终端设备i的所有计算任务中执行时长的最大值。
终端设备i的状态根据接收到的边缘节点j的资源分配状况,将静态状态X i变化为X′ i=X i∪a (j,i)(即第一计算任务的第二状态的一例)。
网络终端根据新的状态X′ i基于MADRL算法离线训练中的收益指标计算执行第一计算任务的代价值,直到达到算法终止条件,根据第一计算任务的最小代价值得到最终的行动决策A i={c i},其中:c i∈{0,1}表示终端设备i针对计算任务是否执行计算卸载决策,0表示终端设备i本地执行计算任务X,1表示终端设备i卸载计算任务X至边缘节点j进行计算。至此完成基于MADRL的计算卸载决策过程。关于收益指标的描述参见后文,这里暂不展开叙述。
可选的,终止条件包括收敛标准。例如:算法执行到一定程度,再执行下去难以得到更小的代价值,这时认为算法收敛达到终止条件。
可选的,终止条件包括时间标准。例如:预先给定算法的执行时间,迭代次数等。
应理解,本申请对MADRL算法的终止条件不做具体限定。可选的,该行动决策还可以包括计算功率因子(即第二工作功率的一例)。例如:终端设备i针对计算任务X的行动决策可表示为:A i={c i,p i},p i∈{0,1,…,p max}表示当终端设备i本地执行任务时的功率因子,0表示以休眠功率处理任务,p max表示以最大功率处理任务。
应理解,这里的p i为所述MADRL算法达到终止条件时,第一计算任务的最小代价值对应的工作功率。
需要说明的是,这里的p i表示功率的档位,并不代表具体的功率值。例如:0表示终端设备以最低档位的功率值处理任务,这里的最低档位对应的功率值为休眠功率P sleep,P sleep为一个相对较低的功率值,并不表示休眠功率为0,p max表示以最大功率档位处理任务,实际处理任务的功率为终端设备具有的最大的工作功率值P max
具体地,终端设备i的功率因子对应处理计算任务的频率为:
Figure PCTCN2021088860-appb-000006
其中
Figure PCTCN2021088860-appb-000007
Figure PCTCN2021088860-appb-000008
是与决策选项无关的常数值,分别表示终端设备在休眠功率与最大功率时对应处理计算任务的频率值。
具体地,考虑到终端设备往往提供了系统API(Application Programming Interface,应用程序接口)以调整设备的能量模式,进而决定处理计算任务的速度,因此在本申请中设计了功率因子来表示调整计算频率的功能。举例来说,当终端设备决定以本地执行的方式执行任务,并以p i作为处理计算任务的功率因子之后,终端设备将首先判断p i是否等于0:如果p i不等于0(即不是休眠功率)则意味着终端设备需要从休眠功率切换至p i所对应的功率以执行计算任务,因此会调用系统API切换计算频率,这里将会产生一定的时间延迟与能量开销,关于本地处理任务时间延迟与能量开销计算会在后文中的【收益指标设定】中做详细的介绍,这里暂不展开叙述。
此外,对于终端设备的决策包括以下约束条件:
①当终端设备决定计算卸载时则调节功率至休眠功率,即:当c i=1时,则有p i=0;
②当终端设备无法访问边缘节点时则仅能本地执行任务,即:若conn i=0或link i=0,则c i=0。
终端设备i判断行动决策A i中任务处理时长是否超时,当T i小于md i时,终端设备i确定计算任务X的处理回合round i,然后跳转至S607,当T i大于任务时间限制md i时,表示执行计算任务X会超时,则终端设备i直接忽略该任务并产生相应的惩罚值mp i,然后跳转至S601。
需要说明的是,当T i等于md i时,终端设备i根据实际计算任务的情况执行601或607,本申请不做具体限定。
S607,终端设备i开启新线程以决策方式处理计算任务X,之后跳转至S601。
开启的新线程将获取决策A i和计算任务X的相关信息,并以异步方式开始执行S608至S612。
需要说明的是,终端设备i计算任务X的执行过程,是一个由决策过程中开启新线程后,在新的线程中执行的流程步骤,因此决策过程与执行过程不会相互阻塞。
需要说明的是,终端设备i在同一时间片仅能处理一个计算任务,并且处理单个计算任务的过程可能持续数个时间片(即处理回合)。
终端设备i处理计算任务X的新线程步骤如下所示:
S608,终端设备i判断当前决策A i的决策行动方式是否是计算卸载方式,若是则执行S609,否则跳转至S611。
S609,终端设备i向边缘节点j发送处理计算任务X所需的信息。
需要说明的是,由于该过程中传输的数据流相对较大,因此无线传输速率会影响处理时延,而无线传输速率同时还受到环境噪声与带宽争用的影响。
S610,终端设备i接收边缘节点j返回的任务X的处理结果,之后跳转至S612。
S611,终端设备i决定以决策A i={c i,p i}中的计算功率因子p i处理计算任务X,处理完成得到结果后跳转至S612。
S612,终端设备完成了计算任务X的处理过程并返回计算结果,之后结束该线程。
相对应地,边缘节点j的计算卸载过程如下所示:
S701,边缘节点j的决策过程处于等待状态,边缘节点j执行S702或S703。
可选的,边缘节点j在每个时间片开始时执行S702。
可选的,边缘节点j先执行S703,S704再执行S702。
S702,确定边缘节点j当前可用的计算资源。S703,边缘节点j开始接收所服务的终端设备的任务摘要。
由于一个边缘节点往往会服务一个或多个的终端设备,所以一个边缘节点可能会接收到所服务的终端设备发送的一个或多个任务摘要,因此该过程将保持一段时间。其中,边缘节点j服务的终端设备可以表示为:i∈D且conn i=j。
S704,边缘节点j停止接收任务摘要,此时确定当前时间片中所服务的终端设备的任务摘要信息,之后跳转至S705。
边缘节点j接收到的一个或多个任务摘要中包括终端设备i发送的计算任务X的任务摘要,该任务摘要包括计算任务X的状态信息M i
可选的,该任务摘要还包括当前信道噪声w i与当前的信道增益H i。例如:对应S604,边缘节点j接收到终端设备i发送的网络报文
Figure PCTCN2021088860-appb-000009
S705,边缘节点j根据接收到的一个或多个任务摘要,采用MADRL的方法决策确定当前可用资源的分配策略。
边缘节点j在每个时间片的可用资源的分配策略可表示为:A d+j={a (j,1),a (j,2),…,a (j,d)}。这里边缘节点的决策序号d+j是为了与终端设备的决策序号i相区分。其中,a (j,i)∈{0,1,…,a (j,avail)}表示边缘节点j分配给终端设备i的计算资源数量,0表示不分配计算资源,a (j,avail)表示分配当前所有可用资源。具体地,由分配的计算资源数量对应的任务运算频率为:f i=f (conni,i)=a (conni,i)*f unit。其中f unit为决策选项无关的常数数值,单位为赫兹,表示每单位计算资源数量代表的运算频率。
其中,边缘节点j的静态状态可表示为:X d+j=(g (j,1),g (j,2),…,g (j,d))。其中,g (j,i)为边缘节点j在本轮决策前已经分配给终端设备i使用的计算资源数量,即边缘节点j正在使用的资源数量。
给定特定的时间片t与边缘节点j,决策与静态状态之间存在着以下约束:当前可分配资源与已占用资源的相互关系:
Figure PCTCN2021088860-appb-000010
此外,g (j,i)在任务执行流程中的变化流程有:首先,当终端设备i以计算卸载的方式c i=1设置处理回合round i时,则赋值
Figure PCTCN2021088860-appb-000011
(另有在时间片开始时有
Figure PCTCN2021088860-appb-000012
即算法开始t=1时,终端设备处于空闲状态,此时g=0。)。之后,当终端设备i在时间片t结束时有round i=0时,则再次赋值
Figure PCTCN2021088860-appb-000013
当边缘节点j接收到所服务的终端设备的报文集合后,边缘节点j的状态变化为
Figure PCTCN2021088860-appb-000014
(即第一边缘节点的第四状态),其中,X d+j(即第一边缘节点的第三状态)为边缘节点j接收多个计算任务摘要之前的状态,其中未接收到报文的消息定义为
Figure PCTCN2021088860-appb-000015
(表示终端设备n在当前时间未发送报文至边缘节点j,其中终端设备m为边缘节点j服务的终端设备)。
此外,对于边缘节点的决策包括以下约束条件:
①边缘节点仅为当前服务的终端设备分配资源,即:不存在i∈D和j∈N,使得conn i≠j且a( j,i)>0的情况。
②边缘节点本轮所分配的资源不会超过当前可用资源,即:总有
即:对于任意j∈N,有
Figure PCTCN2021088860-appb-000016
③当边缘节点已为特定计算任务分配资源后,便不会追加分配资源,即:对于所有的i∈D和j∈N,若g (j,i)>0,则有a (j,i)=0。其中,g (j,i)为当前时间片开始前边缘节点j已经为终端设备i分配的资源数量。
基于上述约束条件,边缘节点j根据新的状态X′ d+j基于MADRL算法进行离线训练学习,直到达到算法终止条件,得到最终的行动决策A d+j={a (j,1),a (j,2),…,a (j,d)}(即第二卸载决策的一例),并将该行动决策以广播的方式通过无线接入点发送给所服务的终端设备,例如:一种具体的消息内容形式可以为
Figure PCTCN2021088860-appb-000017
S706,边缘节点j通过连接的无线接入点广播其资源分配策略,之后跳转至S707。
S707,边缘节点j开启多个新线程以执行任务处理过程,之后跳转至S701。其中,开启的线程数不应当少于所分配资源的终端设备数量。
需要说明的是,边缘节点j的计算任务执行过程,是一个由决策过程中开启新线程后,在新的线程中执行的流程步骤,因此决策过程与执行过程不会相互阻塞。此外,该步骤中边缘节点j也可通过线程池的方法以建立以下异步过程。开启的新线程将接收来自所服务的特定终端设备的计算任务,并根据决策的资源分配策略,以异步方式开始执行S708至S711。
边缘节点j处理计算任务的每个新线程的步骤如下所示:
S708,边缘节点j接收所服务的特定终端设备的计算任务信息。应理解,这里的特定终端设备为边缘节点j服务的终端设备中确定将计算任务卸载至边缘节点j计算的终端设备。
例如:S608中终端设备i的行动决策确定将计算任务X卸载至边缘节点j计算,则终端设备i向边缘节点j发送计算任务X的信息,对应的,边缘节点j接收终端设备i发送的计算任务X的信息。
S709,边缘节点j按照S605中的资源分配策略处理接收到的计算任务。
可选的,一种具体实现方式可通过资源时分复用的方式将处理器的工作时间切分为小份,并按照资源分配所占总资源量的比例的分配给各个计算线程。
S710,边缘节点j中将新线程处理完成后的计算结果发送给对应的终端设备。
S711,在完成计算任务流程之后,边缘节点j结束对应的线程。
至此终端设备i和边缘节点j完成使用MADRL算法进行计算卸载的过程,下面,主要针对S606和S705中MADRL算法的决策收益指标进行详细介绍。
终端设备和边缘节点在使用MADRL算法过程中,需要根据各自的收益指标计算执行任务的的代价值,确定自身的行动策略。
终端设备的收益指标分为两种情况:本地执行和计算卸载。
第一种情况:
在S606中终端设备i对于计算任务X选择本地执行(c i=0),即A i={0,p i},其中,M i={mt i,mc i,md i,mp i}。
①本地执行任务X的时间(即S603中的任务处理时间)可由以下公式计算得出:
Figure PCTCN2021088860-appb-000018
(即第二时延开销的一例)。其中,F i表示根据功率因子确定的终端设备i处理计算任务的频率,
Figure PCTCN2021088860-appb-000019
是一个指示符函数,当其右下方的不等式成立 时函数值为1,否则函数值为0,
Figure PCTCN2021088860-appb-000020
代表终端设备i从休眠功率调整到p i的所需要的时钟周期数,对于特定终端设备i来说是固定值。
②本地执行任务X消耗能量可由以下公式计算得出:
Figure PCTCN2021088860-appb-000021
(即第二能耗开销的一例)。其中,
Figure PCTCN2021088860-appb-000022
表示每个运算周期所消耗能量的系数,可选的,ε=10 -11
第二种情况:
在S606中终端设备i对于计算任务X选择计算卸载(c i=1),即A i={1,0}。
①计算卸载执行任务X的时间可由以下公式计算得出:
Figure PCTCN2021088860-appb-000023
(即第一时延开销的一例),即为传输该特定计算任务数据流的时间
Figure PCTCN2021088860-appb-000024
与边缘节点的处理时间
Figure PCTCN2021088860-appb-000025
之和。其中,传输该特定计算任务数据流的无线传输速率R i(c all)可由以下公式计算得出:
Figure PCTCN2021088860-appb-000026
其中,c all={c 1,c 2,…,c d},表示所有终端设备是否执行计算卸载的决策集合。
Figure PCTCN2021088860-appb-000027
表示序号为link i的无线信道带宽(固定常数值),P i表示终端设备i的传输功率(本申请中假设其固定不变),H i表示终端设备i至无线接入点的信道增益。w i表示背景干扰功率,具体包括背景噪声与不参与计算卸载决策的其它无线设备在该信道上的通信干扰。
Figure PCTCN2021088860-appb-000028
表示在当前时间片做出计算卸载决策且使用相同信道的其它设备的干扰和。
②计算卸载执行任务X的消耗能量可表示为
Figure PCTCN2021088860-appb-000029
(即第一能耗开销的一例),即数据发送能耗与接收能耗之和,其中,接收能耗包含一部分接收后等待过程所产生的尾部能耗,E tail为接收任务耗能,考虑到接收任务耗时较短,一般情况下可以将其近似为常数值。
综合上述两种效益指标,终端设备执行计算任务的代价(即收益的负值)权衡了任务执行时间与对应的能量消耗,具体表示为:
Figure PCTCN2021088860-appb-000030
上述终端设备i的代价公式C i(A,X)(即第一代价函数的一例)表示了终端设备i根据当前任务的状态与所有决策实体(即所有的终端设备和边缘节点)的决策集合A,处理决策流程所产生的资源耗费对应的代价值。其中,α i∈(0,1](即第一参数的一例)为技术差异因子,用来权衡终端设备i与对应的边缘节点处理任务的技术差异。当终端设备与对应的边缘节点采用相同的技术方法处理计算任务时有α i=1,否则若对应的边缘节点采用了性能表现更好的技术处理计算任务,则需要通过调整该值权衡其处理代价。定义β i≥0(即第二参数的一例)为能量权衡因子,用来表示终端设备i对于能量消耗的敏感程度取值,当取基线值时有β i=1。
应理解,这里的决策A i并非是终端设备i的最终行动决策,该行动决策为终端设备i使用MADRL算法迭代学习过程中的某一次决策。终端设备i的最终决策为在MADRL算法达到终止条件时,根据该收益指标计算得到计算任务X的最小代价值确定的行动决策。
边缘节点的收益指标:
边缘节点的收益指标评价权衡了其服务的终端设备的平均代价与公平性。综合上述指标,在S705中,边缘节点j的收益指标具体表示为:C d+j=ct j/cf j(即第二代价函数的一例),边缘节点j的收益指标用于计算边缘节点j在不同资源分配策略下的代价值。其中,ct j(即平均代价函数的一例)表示了当前时间片内边缘节点j服务的终端设备的平均代价,即有:
Figure PCTCN2021088860-appb-000031
其中,D (j,serv)={i|i∈D and conn i=j}
其中,cf j(即公平代价函数的一例)是根据当前正使用边缘节点j的计算资源的终端设备的数量设置的公平因子,具体可表示为:
Figure PCTCN2021088860-appb-000032
其中,g′ (j,i)为当前时间片结束时,边缘节点j已经为终端设备i分配的资源数量。K(x)为非负单调递增函数,具体可根据实际情况设计经验函数,本申请不做限定。
可选的,一种K(x)函数可以设置为K(0)=0.2,K(1)=1,K(2)=1.6,K(3)=1.8,K(4)=2.2,K(5)=2.5,K(6)=2.7,K(7)=2.8,K(x)=3(x≥8且x为整数)。
可选的,另一种K(x)函数可以设置为K(x)=log2(x+2),其中,x≥0且x为整数。
可选的,又一种K(x)函数可以设置为K(x)=x+1,其中,x≥0且x为整数。
应理解,边缘节点j的最终决策为在MADRL算法达到终止条件时,根据边缘节点的收益指标计算得到边缘节点j的最小代价值确定的行动决策。
上述技术方案中,详细描述了计算卸载过程中终端设备与边缘节点的交互过程与交互信息,使得双方能够在实际决策中对于环境有更广泛的感知能力,这能够有效地提升以上两者的决策收益,同时信息交互过程简单、易于实现,并且过程对应的开销也相对较小。
同时考虑了计算卸载方(即终端设备)与资源提供方(即边缘节点)的卸载决策与资源分配决策,提出的收益指标权衡了多种因素,同时根据现实情况定义了相应的公平因子,收益指标反映的现实环境更为全面。同时受益于MADRL算法的优势,本申请中的必要近似较少并能同时保障算法在求解问题时的稳定性。这里的必要近似是指,在求解问题时不得不引入的近似假设或者条件,而MADRL在求解时对于问题适应性较强。
另外,在终端设备本地处理计算任务的收益指标中,引入功率控制因素,特别分析并解决了功率切换时的代价损失这一问题,这有助于更为精细地描述计算卸载中的决策内容,进而有效地提高用户体验、减少模型误差。另一方面,本申请中针对边缘节点决策项也保证所服务的终端设备的平均体验的同时,也提升了网络终端在资源利用上的公平性,即在保证资源高效分配的同时也避免了服务用户的数量过少。
下面以多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法为例说明计算卸载的过程,作为一种典型的MADRL算法,MADDPG是DDPG为适应多智能体环境的改进算法,其中,每一个智能体都独立运行着一个DDPG。DDPG算法是Actor-Critic算法的升级版,MADDPG的改进之处是每个智能体的Critic部分都能够获取到其余智能体的动作信息,以此在训练过程中感知到其它智能体的行动策略。
基于MADDPG的计算卸载方法包括离线训练和在线推导过程,其中,离线训练包括数据采集与训练过程。结合图7-图9对MADDPG的离线训练过程与在线推导过程进行介绍。
参见图7,图7是本申请提供的基于MADDPG的数据采集的示意性流程图。其中,训练数据采集过程的具体步骤如下:
(1)收集边缘节点信息:收集边缘节点的运算能力以及服务的终端设备列表。
(2)收集终端设备静态信息:从终端设备上收集设备相关的静态信息,包括:终端设备的运算能力、传输功率、关联情况、任务种类等。
(3)判断是否动态生成数据集,如果是,则执行(4),否则跳转至(6)
(4)建立距离-增益拟合模型:根据传输速率与物理距离等动态信息建立距离-增益的拟合模型,以获得当前状态中终端设备的信道增益H,并随机生成背景噪声w。
(5)模拟轨迹与任务产生事件:根据给定参数模拟终端移动轨迹与任务产生机制。假设终端移动以随机路点模式生成移动轨迹,假设计算任务以随机事件的方式发生并遵从伯努利分布。随后,跳转至(7)。
(6)收集终端动态信息:从终端上收集信道干扰、传输速率与任务产生事件等决策所需的动态信息。收集完毕后执行下一步。
(7)统计信息并制作成数据集。随后结束。
参见图8,图8是本申请提供的基于MADDPG的参数模型训练的示意性流程图。
这里仅简单介绍终端设备和边缘节点的参数模型的训练过程,关于如何使用训练好的参数模型的进行离线训练的过程参见图6中的描述,这里不再赘述。
离线训练参数模型的过程如下所述:
(1)导入离线数据集,将收集过程中制作的数据集导入训练模型。
(2)初始化每个终端设备与边缘节点的策略参数、随机函数与初始系统状态X。
对于终端设备,系统初始状态包括:初始的任务状态、网络链路情况、边缘节点连接情况,网络噪声状态,此时处理回合为0(即此时终端设备为空闲状态)。
对于边缘节点,系统初始状态为(0,0,0,…,0),表示初始时尚未分配计算资源给所服务的终端设备。
(3)执行计算卸载流程,边缘节点首先根据策略参数、观测状态与随机函数确定决策。
(4)终端设备根据策略参数、观测状态与随机函数确定相对应的决策。
(5)根据系统状态X与决策集合A,训练过程得到终端设备与边缘节点对于该轮时间片内任务执行的行动代价C,并产生下一个系统状态X’。
(6)收集上述的系统状态X、智能体动作集合A、行动代价C与下一个系统状态X’, 形成经验(X,A,C,X’),保持到经验库D中。
(7)在训练过程所在的设备(终端设备或边缘节点)中,逐次执行每一个终端设备与边缘节点的训练过程,并更新相应模型的策略梯度。
(8)最后,判定训练过程是否达到了终止条件,如果达到了就结束该过程并保存模型参数,否则即继续训练过程。
参见图9,图9是本申请提供的基于MADDPG的计算卸载的方法的示意性流程图。图9与图6中描述的计算卸载过程基本一致,这里只做简要说明。
(1)初始化:边缘节点根据获得的参数信息进行初始化过程。终端设备上传计算任务种类与运算能力等终端设备静态信息。边缘节点根据以上信息下放类似配置的终端设备模型参数,以供终端进行模型参数的初始化,并执行时间片的同步过程。
(2)状态感知:当终端有新任务产生时,就会在时间片开始时向对应的边缘节点发送任务摘要。边缘节点根据当前状态与终端设备的摘要信息统计并执行MADRL的推导过程,决策计算资源分配策略并广播给所服务的终端设备。
(3)决策行动:终端设备根据边缘节点的广播信息与自身状态信息执行MADRL的推导过程,在MADDPG算法中仅通过Actor参数来决策相应的行动。
(4)处理任务:终端设备与边缘节点完成决策过程,执行相应的行动。
(5)结束:当终端设备不再需要计算卸载时,则退出决策过程,当边缘节点所有服务的所有终端设备都不再需要计算卸载后,则边缘节点退出服务过程。
上文描述了本申请提供的方法实施例,下文将描述本申请提供的装置实施例。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。
参见图10,图10为本申请提供的通信装置1000的示意性框图。如图10,通信装置1000包括接收单元1100、发送单元1200和处理单元1300。
发送单元1200,用于向第一边缘节点发送第一计算任务的第一状态,其中,第一边缘节点为所述装置获取计算资源的边缘节点,第一状态包括传输第一计算任务的数据流的长度、计算第一计算任务所需耗费的时钟数、第一计算任务的惩罚值中的至少一个;
接收单元1100,用于接收第一边缘节点发送的第二卸载决策,第二卸载决策是由第一状态确定的,第二卸载决策包括一个或多个第二终端设备的计算资源分配信息,第二终端设备为从第一边缘节点获取计算资源的终端设备且所述装置为一个或多个第二终端设备中的一个终端设备;
处理单元1300,用于根据第二卸载决策确定第一计算任务的第一卸载决策,第一卸载决策用于指示是否卸载第一计算任务至第一边缘节点进行计算。
可选地,接收单元1100和发送单元1200也可以集成为一个收发单元,同时具备接收和发送的功能,这里不作限定。
可选地,在一个实施例中,当第一卸载决策指示该通信装置卸载第一计算任务至第一边缘节点进行计算时,发送单元1200还用于:
向第一边缘节点发送第一计算任务;接收单元1100,还用于接收第一边缘节点发送的第一计算任务的计算结果;或者当第一卸载决策指示通信装置不卸载所述第一计算任务时,处理单元1300,还用于在本地确定第一计算任务的计算结果。
可选地,在一个实施例中,处理单元1300,具体用于根据第二卸载决策更新第一计算任务的第一状态中的参数得到第一计算任务的第二状态;根据第二状态计算第一计算任务的代价值,第一计算任务的代价值包括第一计算任务的本地开销和卸载开销;根据第一计算任务的代价值确定第一计算任务的第一卸载决策。
可选地,在一个实施例中,处理单元1300,具体用于根据第二状态,使用多智能体深度强化学习MADRL算法中的第一代价函数确定第一计算任务的代价值,第一代价函数包括卸载开销函数和本地计算开销函数,其中,卸载开销函数用于确定第一计算任务的卸载开销,本地计算开销函数用于确定第一计算任务的本地开销;以及根据MADRL算法迭代更新所述装置的第一计算任务的状态和第一计算任务的代价值;当MADRL算法达到终止条件时,处理单元1300,还用于根据第一计算任务的代价值确定第一计算任务的第一卸载决策。
可选地,在一个实施例中,第一计算任务的卸载开销包括第一能耗开销和第一时延开销,其中,第一能耗开销包括所述装置将第一计算任务卸载至第一边缘消耗的能量,第一时延开销包括所述装置将第一计算任务卸载至第一边缘节点的时延以及第一边缘节点确定第一计算任务的计算结果的时延。
可选地,在一个实施例中,第一计算任务的本地开销包括第二能耗开销和第二时延开销,其中,第二能耗开销包括所述装置本地计算第一计算任务消耗的能量和装置从休眠功率P sleep切换到第一工作功率P 1消耗的能量,第一时延开销包括装置本地计算第一计算任务的时延和所述装置从休眠功率切换到第一工作功率P 1的时延,第一工作功率P 1为所述装置本地计算任务的工作功率。
可选地,在一个实施例中,第一卸载决策还包括第二工作功率,第二工作功率为MADRL算法达到终止条件时,第一计算任务的最小代价值对应的工作功率。
可选地,在一个实施例中,当第一卸载决策指示所述装置卸载第一计算任务至第一边缘节点进行计算时,处理单元1300,还用于以休眠功率工作。
可选地,在一个实施例中,处理单元1300,还用于使用第一参数对第一时延开销进行动态调节,第一参数用于表示处理单元1300与第一边缘节点处理计算任务的技术差异。
可选地,在一个实施例中,处理单元1300,还用于使用第二参数对第一能耗开销和第二能耗开销进行动态调节,第二参数用于表示处理单元1300对于能耗开销的敏感程度。
在一种实现方式中,通信装置1000可以为方法实施例中的第一终端设备。在这种实现方式中,接收单元1100可以为接收器,发送单元1200可以为发射器。接收器和发射器也可以集成为一个收发器。
在另一种实现方式中,通信装置1000可以为第一终端设备中的芯片或集成电路。在这种实现方式中,接收单元1100和发送单元1200可以为通信接口或者接口电路。例如,接收单元1100为输入接口或输入电路,发送单元1200为输出接口或输出电路。
处理单元1300可以为处理装置。其中,处理装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。例如,处理装置可以包括至少一个处理器和至少一个存储器,其中,所述至少一个存储器用于存储计算机程序,所述至少一个处理器读取并执行所述至少一个存储器中存储的计算机程序,使得通信装置1000执行各方法实施例中由第一终端设备执行的操作和/或处理。
可选地,处理装置可以仅包括处理器,用于存储计算机程序的存储器位于处理装置之外。处理器通过电路/电线与存储器连接,以读取并执行存储器中存储的计算机程序。可选地,在一些示例中,处理装置还可以为芯片或集成电路。
参见图11,图11为本申请提供的通信装置2000的示意性框图。如图11,通信装置2000包括接收单元2100、发送单元2200和处理单元2300。
接收单元2100,用于接收一个或多个任务的状态,一个或多个任务的状态包括第一终端设备发送的第一计算任务的第一状态,所述装置是为一个或多个第二终端设备提供计算资源的边缘节点,且第一终端设备为一个或多个第二终端设备中的一个终端设备;
处理单元2300,用于根据一个或多个任务状态,确定第二卸载决策,第二卸载决策包括处理单元2300对一个或多个第二终端设备的计算资源分配信息;
发送单元2200,用于向一个或多个第二终端设备广播第二卸载决策。
可选地,接收单元2100和发送单元2200也可以集成为一个收发单元,同时具备接收和发送的功能,这里不作限定。
可选地,在一个实施例中,接收单元2100,还用于接收第一终端设备发送的第一计算任务;处理单元2300,还用于确定第一计算任务的计算结果;发送单元2200,还用于向第一终端设备发送第一计算任务的计算结果。
可选地,在一个实施例中,处理单元2300,具体用于根据一个或多个任务的状态更新第一边缘节点的第三状态得到第一边缘节点的第四状态,其中,第三状态是第一边缘节点接收到一个或多个任务状态之前的状态;处理单元2300,还用于根据第四状态确定装置的代价值,代价值为处理单元2300为一个或多个计算任务分配计算资源的开销;处理单元2300,根据代价值确定第二卸载决策。
可选地,在一个实施例中,处理单元2300,具体用于根据第四状态,使用多智能体深度强化学习MADRL算法中的第一代价函数和第二代价函数确定代价值;第一代价函数包括卸载开销函数和本地计算开销函数,其中,卸载开销函数用于确定一个或多个任务的卸载开销,本地计算开销函数用于计算一个或多个任务的本地开销;第二代价函数包括平均代价函数和公平代价函数,其中,平均代价函数用于根据一个或多个任务的卸载开销和本地开销确定一个或多个任务的平均开销,公平代价函数用于根据使用所述装置的计算资源的第二终端设备的数量确定所述装置的公平代价;处理单元2300,具体用于根据一个或多个任务的平均开销与公平代价确定所述装置的代价值。
可选地,在一个实施例中,处理单元2300,具体用于根据MADRL算法迭代更新第一边缘节点的状态和第一边缘节点的代价值;在MADRL算法达到终止条件的情况下,处理单元2300,还用于根据第一边缘节点的代价值确定第二卸载决策。
在一种实现方式中,通信装置2000可以为方法实施例中的第一边缘节点。在这种实现方式中,接收单元2100可以为接收器,发送单元2200可以为发射器。接收器和发射器也可以集成为一个收发器。
在另一种实现方式中,通信装置2000可以为第一边缘节点中的芯片或集成电路。在这种实现方式中,接收单元2100和发送单元2200可以为通信接口或者接口电路。例如,接收单元2100为输入接口或输入电路,发送单元2200为输出接口或输出电路。
处理单元1300可以为处理装置。其中,处理装置的功能可以通过硬件实现,也可以 通过硬件执行相应的软件实现。例如,处理装置可以包括至少一个处理器和至少一个存储器,其中,所述至少一个存储器用于存储计算机程序,所述至少一个处理器读取并执行所述至少一个存储器中存储的计算机程序,使得通信装置2000执行各方法实施例中由第一边缘节点执行的操作和/或处理。
可选地,处理装置可以仅包括处理器,用于存储计算机程序的存储器位于处理装置之外。处理器通过电路/电线与存储器连接,以读取并执行存储器中存储的计算机程序。可选地,在一些示例中,处理装置还可以为芯片或集成电路。
参见图12,图12为本申请提供的通信装置10的示意性结构图。如图12,通信装置10包括:一个或多个处理器11,一个或多个存储器12以及一个或多个通信接口13。处理器11用于控制通信接口13收发信号,存储器12用于存储计算机程序,处理器11用于从存储器12中调用并运行该计算机程序,以使得本申请各方法实施例中由终端设备执行的流程和/或操作被执行。
例如,处理器11可以具有图10中所示的处理单元1300的功能,通信接口13可以具有图10中所示的发送单元1100和/或接收单元1200的功能。具体地,处理器11可以用于执行图5和图6中由第一终端设备内部执行的处理或操作,通信接口13用于执行图5和图6中由第一终端设备执行的发送和/或接收的动作。
在一种实现方式中,通信装置10可以为方法实施例中的第一终端设备。在这种实现方式中,通信接口13可以为收发器。收发器可以包括接收器和发射器。可选地,处理器11可以为基带装置,通信接口13可以为射频装置。在另一种实现中,通信装置10可以为安装在第一终端设备中的芯片或者集成电路。在这种实现方式中,通信接口13可以为接口电路或者输入/输出接口。
参见图13,图13为本申请提供的通信装置20的示意性结构图。如图13,通信装置20包括:一个或多个处理器21,一个或多个存储器22以及一个或多个通信接口23。处理器21用于控制通信接口23收发信号,存储器22用于存储计算机程序,处理器21用于从存储器22中调用并运行该计算机程序,以使得本申请各方法实施例中由网络设备执行的流程和/或操作被执行。
例如,处理器21可以具有图11中所示的处理单元2300的功能,通信接口23可以具有图11中所示的发送单元2200和/或接收单元2100的功能。具体地,处理器21可以用于执行图5和图6中由第一边缘节点内部执行的处理或操作,通信接口33用于执行图5和图6中由第一边缘节点执行的发送和/或接收的动作。
在一种实现方式中,通信装置20可以为方法实施例中的第一边缘节点。在这种实现方式中,通信接口23可以为收发器。收发器可以包括接收器和发射器。在另一种实现中,通信装置20可以为安装在第一边缘节点中的芯片或者集成电路。在这种实现方式中,通信接口23可以为接口电路或者输入/输出接口。
可选的,上述各装置实施例中的存储器与处理器可以是物理上相互独立的单元,或者,存储器也可以和处理器集成在一起,本文不做限定。
此外,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,使得本申请各方法实施例中由第一终端设备执行的操作和/或流程被执行。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,使得本申请各方法实施例中由第一边缘节点执行的操作和/或流程被执行。
此外,本申请还提供一种计算机程序产品,计算机程序产品包括计算机程序代码或指令,当计算机程序代码或指令在计算机上运行时,使得本申请各方法实施例中由第一终端设备执行的操作和/或流程被执行。
本申请还提供一种计算机程序产品,计算机程序产品包括计算机程序代码或指令,当计算机程序代码或指令在计算机上运行时,使得本申请各方法实施例中由第一边缘节点执行的操作和/或流程被执行。
此外,本申请还提供一种芯片,所述芯片包括处理器。用于存储计算机程序的存储器独立于芯片而设置,处理器用于执行存储器中存储的计算机程序,以使得任意一个方法实施例中由第一终端设备执行的操作和/或处理被执行。
进一步地,所述芯片还可以包括通信接口。所述通信接口可以是输入/输出接口,也可以为接口电路等。进一步地,所述芯片还可以包括所述存储器。
本申请还提供一种芯片,所述芯片包括处理器。用于存储计算机程序的存储器独立于芯片而设置,处理器用于执行存储器中存储的计算机程序,以使得任意一个方法实施例中由第一边缘节点执行的操作和/或处理被执行。
进一步地,所述芯片还可以包括通信接口。所述通信接口可以是输入/输出接口,也可以为接口电路等。进一步地,所述芯片还可以包括所述存储器。
此外,本申请还提供一种通信装置(例如,可以为芯片),包括处理器和通信接口,所述通信接口用于接收信号并将所述信号传输至所述处理器,所述处理器处理所述信号,以使得任意一个方法实施例中由第一终端设备执行的操作和/或处理被执行。
本申请还提供一种通信装置(例如,可以为芯片),包括处理器和通信接口,所述通信接口用于接收信号并将所述信号传输至所述处理器,所述处理器处理所述信号,以使得任意一个方法实施例中由第一边缘节点执行的操作和/或处理被执行。
此外,本申请还提供一种通信装置,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于执行所述至少一个存储器中存储的计算机程序或指令,使得任意一个方法实施例中由第一终端设备执行的操作和/或处理被执行。
本申请还提供一种通信装置,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于执行所述至少一个存储器中存储的计算机程序或指令,使得任意一个方法实施例中由第一边缘节点执行的操作和/或处理被执行。
此外,本申请还提供一种第一终端设备,包括处理器、存储器和收发器。其中,存储器用于存储计算机程序,处理器用于调用并运行存储器中存储的计算机程序,并控制收发器收发信号,以使第一终端设备执行任意一个方法实施例中由第一终端设备执行的操作和/或处理。
本申请还提供一种第一边缘节点,包括处理器、存储器和收发器。其中,存储器用于存储计算机程序,处理器用于调用并运行存储器中存储的计算机程序,并控制收发器收发信号,以使第一终端设备执行任意一个方法实施例中由第一边缘节点执行的操作和/或处理。
此外,本申请还提供一种无线通信系统,包括本申请实施例中的第一终端设备和第一边缘节点。
本申请实施例中的处理器可以是集成电路芯片,具有处理信号的能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DRRAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。其中,A、B以及C均可以为单数或者复数,不作限定。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (36)

  1. 一种计算卸载的方法,其特征在于,包括:
    第一终端设备向第一边缘节点发送第一计算任务的第一状态,其中,所述第一边缘节点为所述第一终端设备获取计算资源的边缘节点,所述第一状态包括传输所述第一计算任务的数据流的长度、计算所述第一计算任务所需耗费的时钟周期数、所述第一计算任务的惩罚值中的至少一个;
    所述第一终端设备接收所述第一边缘节点发送的第二卸载决策,所述第二卸载决策是由所述第一状态确定的,所述第二卸载决策包括一个或多个第二终端设备的计算资源分配信息,所述第二终端设备为从所述第一边缘节点获取计算资源的终端设备且所述第一终端设备为所述一个或多个第二终端设备中的一个终端设备;
    所述第一终端设备根据所述第二卸载决策确定所述第一计算任务的第一卸载决策,所述第一卸载决策用于指示所述第一终端设备是否卸载所述第一计算任务至所述第一边缘节点进行计算。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当所述第一卸载决策指示所述第一终端设备卸载所述第一计算任务至所述第一边缘节点进行计算时,所述第一终端设备向所述第一边缘节点发送所述第一计算任务;
    所述第一终端设备接收所述第一边缘节点发送的所述第一计算任务的计算结果;或者
    当所述第一卸载决策指示所述第一终端设备不卸载所述第一计算任务时,所述第一终端设备在本地确定所述第一计算任务的计算结果。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一终端设备根据所述第二卸载决策确定所述第一计算任务的第一卸载决策,包括:
    所述第一终端设备根据所述第二卸载决策,更新所述第一计算任务的第一状态中的参数得到所述第一计算任务的第二状态;
    所述第一终端设备根据所述第二状态计算所述第一计算任务的代价值,所述第一计算任务的代价值包括所述第一计算任务的本地开销和卸载开销;
    所述第一终端设备根据所述第一计算任务的代价值,确定所述第一计算任务的第一卸载决策。
  4. 根据权利要求3所述的方法,其特征在于,所述第一终端设备根据所述第二状态计算所述第一计算任务的代价值,包括:
    所述第一终端设备根据所述第二状态,使用多智能体深度强化学习MADRL算法中的第一代价函数确定所述第一计算任务的代价值,
    所述第一代价函数包括卸载开销函数和本地计算开销函数,其中,所述卸载开销函数用于确定所述第一计算任务的卸载开销,所述本地计算开销函数用于确定所述第一计算任务的本地开销;以及
    所述第一终端设备根据所述第一计算任务的代价值确定所述第一计算任务的第一卸载决策,包括:
    所述第一终端设备根据所述MADRL算法迭代更新所述第一终端设备的第一计算任 务的状态和所述第一计算任务的代价值;
    当所述MADRL算法达到终止条件时,所述第一终端设备根据所述第一计算任务的最小代价值确定所述第一计算任务的第一卸载决策。
  5. 根据权利要求4所述的方法,其特征在于,
    所述第一计算任务的卸载开销包括第一能耗开销和第一时延开销,其中,
    所述第一能耗开销包括所述第一终端设备将所述第一计算任务卸载至所述第一边缘消耗的能量;
    所述第一时延开销包括所述第一终端设备将所述第一计算任务卸载至所述第一边缘节点的时延以及所述第一边缘节点计算所述第一计算任务的计算结果的时延。
  6. 根据权利要求4或5所述的方法,其特征在于,
    所述第一计算任务的本地开销包括第二能耗开销和第二时延开销,其中,
    所述第二能耗开销包括所述第一终端设备本地计算所述第一计算任务消耗的能量和所述第一终端设备从休眠功率Psleep切换到第一工作功率P1消耗的能量;
    所述第二时延开销包括所述第一终端设备本地计算所述第一计算任务的时延和所述第一终端设备从所述休眠功率切换到所述第一工作功率P1的时延;
    所述第一工作功率P1为所述第一终端设备本地计算任务的工作功率。
  7. 根据权利要求6所述的方法,其特征在于,所述第一卸载决策还包括第二工作功率,所述第二工作功率为所述MADRL算法达到终止条件时,所述第一计算任务的最小代价值对应的工作功率。
  8. 根据权利要求6或7所述的方法,其特征在于,
    当所述第一卸载决策指示所述第一终端设备卸载所述第一计算任务至所述第一边缘节点进行计算时,所述第一终端设备以所述休眠功率工作。
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一终端设备使用第一参数对所述第一时延开销进行动态调节,所述第一参数用于表示所述第一终端设备与所述第一边缘节点处理计算任务的技术差异。
  10. 根据权利要求6至9中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一终端设备使用第二参数对所述第一能耗开销和所述第二能耗开销进行动态调节,所述第二参数用于表示所述第一终端设备对于能耗开销的敏感程度。
  11. 一种计算卸载的方法,其特征在于,包括:
    第一边缘节点接收一个或多个任务的状态,所述一个或多个任务的状态包括第一终端设备发送的第一计算任务的第一状态,所述第一边缘节点是为一个或多个第二终端设备提供计算资源的边缘节点,且所述第一终端设备为所述一个或多个第二终端设备中的一个终端设备;
    所述第一边缘节点根据所述一个或多个任务状态,确定第二卸载决策,所述第二卸载决策包括所述第一边缘节点对所述一个或多个第二终端设备的计算资源分配信息;
    所述第一边缘节点向所述一个或多个第二终端设备广播所述第二卸载决策。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述第一边缘节点接收所述第一终端设备发送的所述第一计算任务;
    所述第一边缘节点确定所述第一计算任务的计算结果;
    所述第一边缘节点向所述第一终端设备发送所述第一计算任务的计算结果。
  13. 根据权利要求11或12所述的方法,其特征在于,所述第一边缘节点根据所述一个或多个任务的状态确定第二卸载决策,包括:
    所述第一边缘节点根据所述一个或多个任务的状态,更新所述第一边缘节点的第三状态得到所述第一边缘节点的第四状态,其中,
    所述第三状态是所述第一边缘节点接收到所述一个或多个任务状态之前的状态;
    所述第一边缘节点根据所述第四状态确定所述第一边缘节点的代价值,所述第一边缘节点的代价值为所述第一边缘节点为所述一个或多个计算任务分配计算资源的开销;
    所述第一边缘节点根据所述第一边缘节点的代价值确定所述第二卸载决策。
  14. 根据权利要求13所述的方法,其特征在于,所述第一边缘节点根据所述第四状态确定所述第一边缘节点的代价值,包括:
    所述第一边缘节点根据所述第四状态,使用多智能体深度强化学习MADRL算法中的第一代价函数和第二代价函数确定所述第一边缘节点的代价值;
    所述第一代价函数包括卸载开销函数和本地计算开销函数,其中,所述卸载开销函数用于确定所述一个或多个任务的卸载开销,所述本地计算开销函数用于计算所述一个或多个任务的本地开销;
    所述第二代价函数包括平均代价函数和公平代价函数,其中,所述平均代价函数用于根据所述一个或多个任务的卸载开销和本地开销确定所述一个或多个任务的平均开销,所述公平代价函数用于根据使用所述第一边缘节点计算资源的第二终端设备的数量确定所述第一边缘节点的公平代价;
    所述第一边缘节点根据所述一个或多个任务的平均开销与所述第一边缘节点的公平代价确定所述第一边缘节点的代价值。
  15. 根据权利要求14所述的方法,其特征在于,所述第一边缘节点根据所述第一边缘节点的代价值确定所述第二卸载决策,包括:
    所述第一边缘节点根据所述MADRL算法迭代更新所述第一边缘节点的状态和所述第一边缘节点的代价值;
    在所述MADRL算法达到终止条件的情况下,所述第一边缘节点根据所述第一边缘节点的最小代价值确定所述第二卸载决策。
  16. 一种通信装置,其特征在于,包括:
    发送单元,用于向第一边缘节点发送第一计算任务的第一状态,其中,所述第一边缘节点为所述通信装置获取计算资源的边缘节点;
    接收单元,用于接收所述第一边缘节点发送的第二卸载决策,所述第二卸载决策是由所述第一状态确定的,所述第二卸载决策包括一个或多个第二终端设备的计算资源分配信息,所述第二终端设备为从所述第一边缘节点获取计算资源的终端设备且所述通信装置为所述一个或多个第二终端设备中的一个终端设备;
    处理单元,用于根据所述第二卸载决策确定所述第一计算任务的第一卸载决策,所述第一卸载决策用于指示是否卸载所述第一计算任务至所述第一边缘节点进行计算。
  17. 根据权利要求16所述的通信装置,其特征在于,
    当所述第一卸载决策指示所述通信装置卸载所述第一计算任务至所述第一边缘节点 进行计算时,所述发送单元,用于向所述第一边缘节点发送所述第一计算任务;
    所述接收单元,用于接收所述第一边缘节点发送的所述第一计算任务的计算结果;或者
    当所述第一卸载决策指示所述通信装置不卸载所述第一计算任务时,所述处理单元,用于在本地确定所述第一计算任务的计算结果。
  18. 根据权利要求16或17所述的通信装置,其特征在于,
    所述处理单元,具体用于根据所述第二卸载决策更新所述第一计算任务的第一状态得到所述第一计算任务的第二状态;
    所述处理单元,还用于根据所述第二状态计算所述第一计算任务的代价值,所述第一计算任务的代价值包括所述第一计算任务的本地开销和卸载开销;
    所述处理单元,还用于根据所述第一计算任务的代价值确定所述第一计算任务的第一卸载决策。
  19. 根据权利要求18所述的通信装置,其特征在于,
    所述处理单元,具体用于根据所述第二状态,使用多智能体深度强化学习MADRL算法中的第一代价函数确定所述第一计算任务的代价值,所述第一代价函数包括卸载开销函数和本地计算开销函数,其中,所述卸载开销函数用于确定所述第一计算任务的卸载开销,所述本地计算开销函数用于确定所述第一计算任务的本地开销;以及
    所述处理单元,还具体用于根据所述MADRL算法迭代更新所述通信装置的第一计算任务的状态和所述第一计算任务的代价值;
    当所述MADRL算法达到终止条件时,所述处理单元,还用于根据所述第一计算任务的最小代价值确定所述第一计算任务的第一卸载决策。
  20. 根据权利要求19所述的通信装置,其特征在于,
    所述第一计算任务的卸载开销包括第一能耗开销和第一时延开销,其中,所述第一能耗开销包括所述通信装置将所述第一计算任务卸载至所述第一边缘消耗的能量,所述第一时延开销包括所述通信装置将所述第一计算任务卸载至所述第一边缘节点的时延以及所述第一边缘节点确定所述第一计算任务的计算结果的时延。
  21. 根据权利要求19或20所述的通信装置,其特征在于,
    所述第一计算任务的本地开销包括第二能耗开销和第二时延开销,其中,
    所述第二能耗开销包括所述通信装置本地计算所述第一计算任务消耗的能量和所述通信装置从休眠功率Psleep切换到第一工作功率P1消耗的能量;
    所述第二时延开销包括所述通信装置本地计算所述第一计算任务的时延和所述通信装置从所述休眠功率切换到所述第一工作功率P1的时延;
    所述第一工作功率P1为所述通信装置本地计算任务的工作功率。
  22. 根据权利要求21所述的通信装置,其特征在于,所述第一卸载决策还包括第二工作功率,所述第二工作功率为所述MADRL算法达到终止条件时,所述第一计算任务的最小代价值对应的工作功率。
  23. 根据权利要求21或22所述的通信装置,其特征在于,
    当所述第一卸载决策指示所述通信装置卸载所述第一计算任务至所述第一边缘节点进行计算时,所述处理单元,还用于以所述休眠功率工作。
  24. 根据权利要求20至23中任一项所述的通信装置,其特征在于,
    所述处理单元,还用于使用第一参数对所述第一时延开销进行动态调节,所述第一参数用于表示所述处理单元与所述第一边缘节点处理计算任务的技术差异。
  25. 根据权利要求21至23中任一项所述的通信装置,其特征在于,
    所述处理单元,还用于使用第二参数对所述第一能耗开销和所述第二能耗开销进行动态调节,所述第二参数用于表示所述处理单元对于能耗开销的敏感程度。
  26. 一种计算卸载的通信装置,其特征在于,包括:
    接收单元,用于接收一个或多个任务的状态,所述一个或多个任务的状态包括第一终端设备发送的第一计算任务的第一状态,所述通信装置是为一个或多个第二终端设备获取计算资源的边缘节点,且所述第一终端设备为所述一个或多个第二终端设备中的一个终端设备;
    处理单元,用于根据所述一个或多个任务状态,确定第二卸载决策,所述第二卸载决策包括所述处理单元对所述一个或多个第二终端设备的计算资源分配信息;
    发送单元,用于向所述一个或多个第二终端设备广播所述第二卸载决策。
  27. 根据权利要求26所述的通信装置,其特征在于,
    所述接收单元,还用于接收所述第一终端设备发送的所述第一计算任务;
    所述处理单元,还用于确定所述第一计算任务的计算结果;
    所述发送单元,还用于向所述第一终端设备发送所述第一计算任务的计算结果。
  28. 根据权利要求26或27所述的通信装置,其特征在于,
    所述处理单元,具体用于根据所述一个或多个任务的状态更新所述第一边缘节点的第三状态得到所述第一边缘节点的第四状态,其中,所述第三状态是所述第一边缘节点接收到所述一个或多个任务状态之前的状态;
    所述处理单元,还用于根据所述第四状态确定所述通信装置的代价值,所述代价值为所述处理单元为所述一个或多个计算任务分配计算资源的开销;
    所述处理单元,根据所述代价值确定所述第二卸载决策。
  29. 根据权利要求28所述的通信装置,其特征在于,
    所述处理单元,具体用于根据所述第四状态,使用多智能体深度强化学习MADRL算法中的第一代价函数和第二代价函数确定所述代价值;
    所述第一代价函数包括卸载开销函数和本地计算开销函数,其中,所述卸载开销函数用于确定所述一个或多个任务的卸载开销,所述本地计算开销函数用于计算所述一个或多个任务的本地开销;
    所述第二代价函数包括平均代价函数和公平代价函数,其中,所述平均代价函数用于根据所述一个或多个任务的卸载开销和本地开销确定所述一个或多个任务的平均开销,所述公平代价函数用于根据使用所述通信装置计算资源的第二终端设备的数量确定所述通信装置的公平代价;
    所述处理单元,具体用于根据所述一个或多个任务的平均开销与所述公平代价确定所述通信装置的代价值。
  30. 根据权利要求29所述的通信装置,其特征在于,
    所述处理单元,具体用于根据所述MADRL算法迭代更新所述第一边缘节点的状态和 所述第一边缘节点的代价值;
    在所述MADRL算法达到终止条件的情况下,所述处理单元,还用于根据所述第一边缘节点的最小代价值确定所述第二卸载决策。
  31. 一种通信装置,其特征在于,包括处理器和接口电路,其中,所述接口电路用于接收计算机代码或指令并将其传输至所述处理器,所述处理器运行所述计算机代码或指令,使得如权利要求1-10中任一项所述的方法被执行,或者,使得如权利要求11-15中任一项所述的方法被执行。
  32. 一种通信装置,其特征在于,其特征在于,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于执行所述至少一个存储器中存储的计算机程序或指令,当所述计算机程序或指令被执行时,如权利要求1-10中任一项所述的方法被执行,或者,如权利要求11-15中任一项所述的方法被执行。
  33. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令被运行时,如权利要求1-10中任一项所述的方法被执行,或者,如权利要求11-15中任一项所述的方法被执行。
  34. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得如权利要求1-10中任一项所述的方法被执行,或者,如权利要求11-15中任一项所述的方法被执行。
  35. 一种计算机程序,当其在计算机上运行时,使得如权利要求1-10中任一项所述的方法被执行,或者,如权利要求11-15中任一项所述的方法被执行。
  36. 一种通信系统,包括如权利要求16-25任一项所述的通信装置,和,如权利要求26-30任一项所述的通信装置。
PCT/CN2021/088860 2020-05-22 2021-04-22 计算卸载的方法和通信装置 WO2021233053A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21807848.3A EP4142235A4 (en) 2020-05-22 2021-04-22 COMPUTING OFFLOADING METHOD AND COMMUNICATION APPARATUS
US17/990,944 US20230081937A1 (en) 2020-05-22 2022-11-21 Computation offloading method and communication apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010438782.2A CN113709201B (zh) 2020-05-22 2020-05-22 计算卸载的方法和通信装置
CN202010438782.2 2020-05-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/990,944 Continuation US20230081937A1 (en) 2020-05-22 2022-11-21 Computation offloading method and communication apparatus

Publications (1)

Publication Number Publication Date
WO2021233053A1 true WO2021233053A1 (zh) 2021-11-25

Family

ID=78645952

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088860 WO2021233053A1 (zh) 2020-05-22 2021-04-22 计算卸载的方法和通信装置

Country Status (4)

Country Link
US (1) US20230081937A1 (zh)
EP (1) EP4142235A4 (zh)
CN (1) CN113709201B (zh)
WO (1) WO2021233053A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114285853A (zh) * 2022-01-14 2022-04-05 河海大学 设备密集型工业物联网中基于端边云协同的任务卸载方法
CN114302435A (zh) * 2021-12-31 2022-04-08 杭州电子科技大学 一种移动边缘计算传输系统中的功耗与时延的优化方法
CN114363338A (zh) * 2022-01-07 2022-04-15 山东大学 一种基于竞争合作平均场博弈的多址接入边缘计算网络任务卸载策略的优化方法
CN114466409A (zh) * 2022-04-11 2022-05-10 清华大学 一种面向机器通信的数据卸载的控制方法和装置
CN114615265A (zh) * 2022-03-09 2022-06-10 浙江工业大学 边缘计算环境下基于深度强化学习的车载任务卸载方法
CN114786215A (zh) * 2022-03-22 2022-07-22 国网浙江省电力有限公司信息通信分公司 多基站移动边缘计算系统传输计算联合优化系统及方法
CN114785782A (zh) * 2022-03-29 2022-07-22 南京工业大学 面向异构云-边计算的通用的任务卸载方法
CN114928609A (zh) * 2022-04-27 2022-08-19 南京工业大学 物联网场景的异构云-边环境的最优任务卸载方法
CN114979014A (zh) * 2022-06-30 2022-08-30 国网北京市电力公司 数据转发路径规划方法、装置以及电子设备
CN115002409A (zh) * 2022-05-20 2022-09-02 天津大学 一种面向视频检测与追踪的动态任务调度方法
CN115022937A (zh) * 2022-07-14 2022-09-06 合肥工业大学 拓扑特征提取方法和考虑拓扑特征的多边缘协作调度方法
CN115065728A (zh) * 2022-06-13 2022-09-16 福州大学 一种基于多策略强化学习的多目标内容存储方法
CN115499875A (zh) * 2022-09-14 2022-12-20 中山大学 一种卫星互联网任务卸载方法、系统以及可读存储介质
CN115623540A (zh) * 2022-11-11 2023-01-17 南京邮电大学 一种移动设备的边缘优化卸载方法
CN116069414A (zh) * 2023-03-06 2023-05-05 湖北工业大学 一种电力物联网计算任务卸载激励优化方法和存储介质
CN116208970A (zh) * 2023-04-18 2023-06-02 山东科技大学 一种基于知识图谱感知的空地协作卸载和内容获取方法
CN117615418A (zh) * 2024-01-19 2024-02-27 北京邮电大学 一种移动感知辅助的车联网服务迁移方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648870B (zh) * 2022-02-11 2023-07-28 行云新能科技(深圳)有限公司 边缘计算系统、边缘计算决策预测方法以及计算机可读存储介质
CN114928607B (zh) * 2022-03-18 2023-08-04 南京邮电大学 面向多边接入边缘计算的协同任务卸载方法
CN116208669B (zh) * 2023-04-28 2023-06-30 湖南大学 基于智慧灯杆的车载异构网络协同任务卸载方法及系统
CN116582836B (zh) * 2023-07-13 2023-09-12 中南大学 一种任务卸载与资源分配方法、设备、介质和系统
CN117492856A (zh) * 2023-10-17 2024-02-02 南昌大学 一种金融物联网中信任评估的低延迟边缘计算卸载方法
CN117202173A (zh) * 2023-11-07 2023-12-08 中博信息技术研究院有限公司 一种面向用户隐私保护的边缘计算卸载方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337091A1 (en) * 2016-05-17 2017-11-23 International Business Machines Corporation Allocating compute offload resources
CN109240818A (zh) * 2018-09-04 2019-01-18 中南大学 一种边缘计算网络中基于用户体验的任务卸载方法
CN110418418A (zh) * 2019-07-08 2019-11-05 广州海格通信集团股份有限公司 基于移动边缘计算的无线资源调度方法和装置
CN110798858A (zh) * 2019-11-07 2020-02-14 华北电力大学(保定) 基于代价效率的分布式任务卸载方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724741B (zh) * 2012-07-13 2015-01-07 电子科技大学 基于微周期的适用于无线传感器网络路由节点的休眠方法
CN105450684B (zh) * 2014-08-15 2019-01-01 中国电信股份有限公司 云计算资源调度方法和系统
CN106534333B (zh) * 2016-11-30 2019-07-12 北京邮电大学 一种基于mec和mcc的双向选择计算卸载方法
US10440096B2 (en) * 2016-12-28 2019-10-08 Intel IP Corporation Application computation offloading for mobile edge computing
CN110351309B (zh) * 2018-04-02 2020-10-09 中国科学院上海微系统与信息技术研究所 网络节点间计算任务卸载平衡方法、系统、介质及设备
CN108632861B (zh) * 2018-04-17 2021-06-18 浙江工业大学 一种基于深度强化学习的移动边缘计算分流决策方法
CN109684075B (zh) * 2018-11-28 2023-04-07 深圳供电局有限公司 一种基于边缘计算和云计算协同进行计算任务卸载的方法
CN109803292B (zh) * 2018-12-26 2022-03-04 佛山市顺德区中山大学研究院 一种基于强化学习的多次级用户移动边缘计算的方法
US11436051B2 (en) * 2019-04-30 2022-09-06 Intel Corporation Technologies for providing attestation of function as a service flavors
US11824784B2 (en) * 2019-12-20 2023-11-21 Intel Corporation Automated platform resource management in edge computing environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337091A1 (en) * 2016-05-17 2017-11-23 International Business Machines Corporation Allocating compute offload resources
CN109240818A (zh) * 2018-09-04 2019-01-18 中南大学 一种边缘计算网络中基于用户体验的任务卸载方法
CN110418418A (zh) * 2019-07-08 2019-11-05 广州海格通信集团股份有限公司 基于移动边缘计算的无线资源调度方法和装置
CN110798858A (zh) * 2019-11-07 2020-02-14 华北电力大学(保定) 基于代价效率的分布式任务卸载方法

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302435B (zh) * 2021-12-31 2023-10-03 杭州电子科技大学 一种移动边缘计算传输系统中的功耗与时延的优化方法
CN114302435A (zh) * 2021-12-31 2022-04-08 杭州电子科技大学 一种移动边缘计算传输系统中的功耗与时延的优化方法
CN114363338A (zh) * 2022-01-07 2022-04-15 山东大学 一种基于竞争合作平均场博弈的多址接入边缘计算网络任务卸载策略的优化方法
CN114363338B (zh) * 2022-01-07 2023-01-31 山东大学 一种基于竞争合作平均场博弈的多址接入边缘计算网络任务卸载策略的优化方法
CN114285853A (zh) * 2022-01-14 2022-04-05 河海大学 设备密集型工业物联网中基于端边云协同的任务卸载方法
CN114615265A (zh) * 2022-03-09 2022-06-10 浙江工业大学 边缘计算环境下基于深度强化学习的车载任务卸载方法
CN114786215A (zh) * 2022-03-22 2022-07-22 国网浙江省电力有限公司信息通信分公司 多基站移动边缘计算系统传输计算联合优化系统及方法
CN114786215B (zh) * 2022-03-22 2023-10-20 国网浙江省电力有限公司信息通信分公司 多基站移动边缘计算系统传输计算联合优化系统及方法
CN114785782A (zh) * 2022-03-29 2022-07-22 南京工业大学 面向异构云-边计算的通用的任务卸载方法
CN114466409B (zh) * 2022-04-11 2022-08-12 清华大学 一种面向机器通信的数据卸载的控制方法和装置
CN114466409A (zh) * 2022-04-11 2022-05-10 清华大学 一种面向机器通信的数据卸载的控制方法和装置
CN114928609A (zh) * 2022-04-27 2022-08-19 南京工业大学 物联网场景的异构云-边环境的最优任务卸载方法
CN114928609B (zh) * 2022-04-27 2023-02-03 南京工业大学 物联网场景的异构云-边环境的最优任务卸载方法
CN115002409A (zh) * 2022-05-20 2022-09-02 天津大学 一种面向视频检测与追踪的动态任务调度方法
CN115002409B (zh) * 2022-05-20 2023-07-28 天津大学 一种面向视频检测与追踪的动态任务调度方法
CN115065728B (zh) * 2022-06-13 2023-12-08 福州大学 一种基于多策略强化学习的多目标内容存储方法
CN115065728A (zh) * 2022-06-13 2022-09-16 福州大学 一种基于多策略强化学习的多目标内容存储方法
CN114979014A (zh) * 2022-06-30 2022-08-30 国网北京市电力公司 数据转发路径规划方法、装置以及电子设备
CN115022937B (zh) * 2022-07-14 2022-11-11 合肥工业大学 拓扑特征提取方法和考虑拓扑特征的多边缘协作调度方法
CN115022937A (zh) * 2022-07-14 2022-09-06 合肥工业大学 拓扑特征提取方法和考虑拓扑特征的多边缘协作调度方法
CN115499875A (zh) * 2022-09-14 2022-12-20 中山大学 一种卫星互联网任务卸载方法、系统以及可读存储介质
CN115499875B (zh) * 2022-09-14 2023-09-22 中山大学 一种卫星互联网任务卸载方法、系统以及可读存储介质
CN115623540B (zh) * 2022-11-11 2023-10-03 南京邮电大学 一种移动设备的边缘优化卸载方法
CN115623540A (zh) * 2022-11-11 2023-01-17 南京邮电大学 一种移动设备的边缘优化卸载方法
CN116069414B (zh) * 2023-03-06 2023-06-09 湖北工业大学 一种电力物联网计算任务卸载激励优化方法和存储介质
CN116069414A (zh) * 2023-03-06 2023-05-05 湖北工业大学 一种电力物联网计算任务卸载激励优化方法和存储介质
CN116208970B (zh) * 2023-04-18 2023-07-14 山东科技大学 一种基于知识图谱感知的空地协作卸载和内容获取方法
CN116208970A (zh) * 2023-04-18 2023-06-02 山东科技大学 一种基于知识图谱感知的空地协作卸载和内容获取方法
CN117615418A (zh) * 2024-01-19 2024-02-27 北京邮电大学 一种移动感知辅助的车联网服务迁移方法
CN117615418B (zh) * 2024-01-19 2024-04-12 北京邮电大学 一种移动感知辅助的车联网服务迁移方法

Also Published As

Publication number Publication date
EP4142235A4 (en) 2023-10-18
CN113709201A (zh) 2021-11-26
CN113709201B (zh) 2023-05-23
EP4142235A1 (en) 2023-03-01
US20230081937A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
WO2021233053A1 (zh) 计算卸载的方法和通信装置
Sun et al. Joint offloading and computation energy efficiency maximization in a mobile edge computing system
Yang et al. DEBTS: Delay energy balanced task scheduling in homogeneous fog networks
Tran et al. Joint task offloading and resource allocation for multi-server mobile-edge computing networks
Dai et al. Joint load balancing and offloading in vehicular edge computing and networks
Wang et al. User mobility aware task assignment for mobile edge computing
Guo et al. Energy-efficient and delay-guaranteed workload allocation in IoT-edge-cloud computing systems
Zhao et al. Contract-based computing resource management via deep reinforcement learning in vehicular fog computing
Azari et al. On the latency-energy performance of NB-IoT systems in providing wide-area IoT connectivity
Kasgari et al. Model-free ultra reliable low latency communication (URLLC): A deep reinforcement learning framework
Jian et al. Joint computation offloading and resource allocation in C-RAN with MEC based on spectrum efficiency
CN110753319A (zh) 异构车联网中面向异质业务的分布式资源分配方法及系统
CN114650228B (zh) 一种异构网络中基于计算卸载的联邦学习调度方法
Nguyen et al. Joint offloading and IEEE 802.11 p-based contention control in vehicular edge computing
WO2022226713A1 (zh) 策略确定的方法和装置
Salh et al. Refiner GAN algorithmically enabled deep-RL for guaranteed traffic packets in real-time URLLC B5G communication systems
Chang et al. Offloading decision in edge computing for continuous applications under uncertainty
Li et al. A trade-off task-offloading scheme in multi-user multi-task mobile edge computing
Cárdenas et al. Bringing AI to the edge: A formal M&S specification to deploy effective IoT architectures
CN116848828A (zh) 机器学习模型分布
Gao et al. An integration of online learning and online control for green offloading in fog-assisted IoT systems
Kim et al. Adaptive scheduling for multi-objective resource allocation through multi-criteria decision-making and deep Q-network in wireless body area networks
CN116828534A (zh) 基于强化学习的密集网络大规模终端接入与资源分配方法
Alfaqawi et al. Energy harvesting network with wireless distributed computing
WO2022206513A1 (zh) 模型处理的方法、通信装置和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21807848

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021807848

Country of ref document: EP

Effective date: 20221123

NENP Non-entry into the national phase

Ref country code: DE