CN111641681A

CN111641681A - Internet of things service unloading decision method based on edge calculation and deep reinforcement learning

Info

Publication number: CN111641681A
Application number: CN202010394958.9A
Authority: CN
Inventors: 胡文建; 苏汉; 张益辉; 赵会峰; 何利平; 李霞; 孙玲; 张颖; 陈瑞华; 郭家伟; 马岩; 杨宇皓; 徐良燕; 吴晓云; 孙静; 陈方; 赵灿; 王琳; 王珂; 王飞
Original assignee: State Grid Corp of China SGCC; State Grid Hebei Electric Power Co Ltd; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Hebei Electric Power Co Ltd; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2020-09-08

Abstract

The invention provides an Internet of things service unloading decision method based on edge calculation and deep reinforcement learning, which belongs to the technical field of Internet of things application. The invention solves the inherent network structuring problem in the traditional IoT, and utilizes the SDN technology to separate the network control plane and the data forwarding plane, thereby obtaining the whole network centralized view, ensuring the network security performance and also obtaining the strong resource management and control arrangement capability brought by the programmable interface.

Description

Internet of things service unloading decision method based on edge calculation and deep reinforcement learning

Technical Field

The invention belongs to the technical field of application of the Internet of things, and particularly relates to an Internet of things service unloading decision method combining edge calculation, an SDN technology and a deep reinforcement learning algorithm.

Background

With the rapid development of the internet of things (IoT) and the continuous emergence of various emerging applications (such as smart homes, smart cities and intelligent transportation), the requirements of users on the quality of service (QoS) of the network are higher and higher. The cloud computing has strong computing capacity, and the device can transmit the computing task to the remote cloud server for execution through computing unloading, so that the purposes of relieving computing and storage limitations and prolonging the service life of the device battery are achieved. However, offloading computing tasks to cloud servers cannot meet the requirements of latency sensitive services. Therefore, edge computing is carried out, computing tasks are transmitted to computing nodes at the edge of the network to be executed, a core network and a data center are not needed, local service localization service can be achieved, energy consumption is reduced, and the low-delay requirements of services are met.

Devices in the IoT may generate large amounts of data including multimedia information such as video, images, and sound, or structured data such as temperature, vibration, and light flux information. There are many mature technologies for processing structured data and then automatically controlling internet of things devices. The traditional multimedia processing technology needs complex calculation and is not suitable for the service of the Internet of things. For the service unloading of the device, it is often necessary to comprehensively consider various information such as whether the device has mobility, node type, node resource usage, and the like, and perform joint optimization. Deep reinforcement learning, as a big data analysis tool, has become an important processing method in many informatics fields such as visual recognition, natural language processing and bioinformatics, and has become an effective method for solving such problems.

In order to solve the development situation of the prior art, the existing papers and patents are searched, compared and analyzed, and the following technical information with high relevance to the invention is screened out:

in a first improved aspect, a patent of 'a method and a device for dynamically unloading service of internet of things based on edge computing', with a patent number of CN109510869A, provides a method and a device for dynamically unloading service of internet of things based on edge computing, wherein the method comprises the steps of S1, obtaining the arrival volume of various services of internet of things at the time t; s2, determining the corresponding unloading capacity of the edge cloud service and the corresponding unloading capacity of the cloud computing center service by maximizing the unloading revenue function of the type of the IOT service according to the arrival quantity of the type of the IOT service at the moment; and S3, selecting the edge cloud server with the least backlog of the type of Internet of things service from the edge cloud servers aiming at each type of Internet of things service, and unloading the unloading amount of the edge cloud service corresponding to the type of Internet of things service determined in the S2 to the edge cloud server with the least backlog of the type of Internet of things service. The dynamic unloading method of the Internet of things service based on edge computing can be well suitable for the dynamic property of task arrival, and the computing complexity is low. The technical scheme provides an internet of things service dynamic unloading method based on edge computing, and at each moment, unloading amounts of an edge cloud and a cloud computing center are distributed for each type of internet of things service according to a principle of maximizing unloading profit; and then selecting the edge cloud with the minimum service queue from the edge cloud servers to process the unloading capacity of the edge cloud, and finally determining the unloading scheme of all the Internet of things services. Although the invention can be well adapted to the dynamic property of task arrival, the consideration of the comprehensive factors such as unloading, unloading capacity and unloading position, the considered service type, user perfection, access technology, network flow, equipment capability, edge node attribute and the like is still not comprehensive.

In a second improvement aspect, a patent of "a calculation unloading method based on mobile edge calculation in the internet of things" of patent No. CN109788069A belongs to the technical field of task unloading in the internet of things, and specifically relates to a calculation unloading method based on mobile edge calculation in the internet of things. The invention relates to theoretical frameworks of Internet of Things (IoT), Mobile Edge Computing (MEC), mode selection and node matching, dynamic optimization and the like. According to the technical scheme, four calculation unloading modes of local unloading, direct cloud unloading, equipment side unloading and equipment relay forwarding unloading are considered, the influence of social relations among equipment on unloading service levels and the long-term dynamic performance of the system are considered, a long-term yield function of the system related to time delay and energy consumption is built, and a calculation unloading scheme based on edge calculation in the Internet of things is obtained through mode selection and node matching. The invention has the advantages of reaching better balance on time delay and energy consumption performance and improving the reliability and stability of the system. According to the technical scheme, four calculation unloading modes are considered, long-term gain functions related to time delay and energy consumption are constructed by combining the influence of social relations among devices on unloading service levels and the long-term dynamic performance of the system, and the edge-based calculation unloading scheme in the Internet of things is obtained by selecting different modes and matching appropriate nodes. The invention can achieve better balance on time delay and energy consumption performance, but can not obtain the global view of each node state in the environment of the Internet of things in real time, thereby obtaining a better unloading scheme.

In a third type of improvement aspect, the invention relates to a method for scheduling virtual computing resources of the internet of things based on edge cooperation, and belongs to the field of virtualized wireless networks, in particular to computing resource scheduling in edge cooperation for application of the internet of things, wherein the patent is 'a computing unloading method based on mobile edge computing in the internet of things' of patent number CN 109819046A. The virtual computing resource scheduling architecture based on edge cooperation is designed, idle virtual resources of edge horizontal intelligent IoT equipment, vertical sensor nodes and infrastructure are fully utilized, and the resource utilization rate and the QoS (quality of service) of intelligent Internet of things application are remarkably improved. In addition, the provided algorithm can select an optimal calculation unloading path, and on the premise of optimizing data transmission delay, the calculation resources occupied by the application are minimized, so that more Internet of things devices can obtain the calculation resources, and the normal operation of the application is ensured. The virtual computing resource scheduling architecture based on edge cooperation provided by the technical scheme takes IoT equipment as a center, and designs a resource efficient computing resource scheduling algorithm. On the premise of ensuring the QoS required by each intelligent Internet of things application, the computing resources occupied by the application are minimized. Meanwhile, the algorithm can select the optimal calculation unloading path, so that more Internet of things equipment can obtain calculation resources on the premise of optimizing data transmission delay, and the normal operation of the application is ensured. But the unloading path has insufficient sensitivity to the change of the current environment state of the internet of things and lacks dynamic adaptability.

Disclosure of Invention

The invention aims to provide an internet of things service unloading decision model combining SDN, Deep Reinforcement Learning (DRL) and edge calculation, which aims to overcome series of defects in the prior art.

The following embodiments of the present invention provide internet of things service offloading decision methods based on edge computing and deep reinforcement learning, which configure the internet of things as SDIoT including a plurality of regions, where each region includes a regional SDN controller configured with a service offloading decision model, and the regional SDN controller outputs a service offloading decision of an intelligent service in the region according to the service offloading decision model configured by the regional SDN controller.

In an embodiment of one aspect, the service unloading problem of the service unloading decision model is to solve the minimum execution delay of one intelligent service, and allocate the calculation resources of an unloading object under the condition that the total amount of the resources of the unloading object is not exceeded, so that the number of intelligent services which can be simultaneously executed by the internet of things is maximized. As an improvement, the service unloading problem is solved by adopting a deep reinforcement learning algorithm in a service unloading decision model, and the methods comprise a DDPG algorithm, a DQN algorithm and the like.

In some embodiments that employ a DDPG algorithm for solution, the improvement of the first aspect is that the deep reinforcement learning algorithm is a DDPG algorithm that focuses on an optimal strategy and focuses on an optimal reward sum; the improvement of the second aspect is that the DDPG algorithm is provided with an experience pool; the improvement of the third aspect is that the value function of the DDPG algorithm is set to represent the total reward obtained by an agent taking an action a for a smart service when state s follows policy π

A task K of_pA decision result of (1); the improvement of the fourth aspect is that the reward function of the DDPG algorithm is expressed as the difference between the time required for an intelligent service to execute entirely locally and the service execution time for that service under a decision.

In an embodiment of an aspect, the internet of things service offloading decision method based on edge computing and deep reinforcement learning may be described as including the steps of:

s100, establishing an SDN-based Internet of things architecture, wherein at least one layer of the SDN-based Internet of things architecture is provided with a plurality of regional SDN controllers, and a calculation unloading decision algorithm of one region is dynamically executed by the regional SDN controllers in the region;

s200, establishing a service unloading problem model based on a task unloading mode;

and S300, for the calculation unloading decision of the intelligent service in one area, automatically outputting by an SDN controller of the area configured with a calculation unloading decision algorithm.

In one embodiment of the aspect, the internet of things architecture comprises a cloud service master control layer, a regional SDN control layer, an edge node layer, a data layer and a device layer; the calculation unloading decision algorithm is a deep reinforcement learning algorithm.

In one embodiment of this aspect, the computational offload decision algorithm is also used for computational resource allocation for the region.

The main inventive concept of all embodiments of the present invention lies in: in the first aspect, a multi-layer IoT architecture comprising an IoT equipment layer, a data layer, an edge node layer, a regional SDN control layer and a cloud service master control layer is designed, the architecture can accelerate the computing speed and optimize the problems of high data transmission load and safety in IoT; in the second aspect, the method is provided, under four task unloading modes, network resources in IoT can be reasonably utilized, the requirement of business on low time delay is met, meanwhile, protection and isolation of some sensitive data can be realized, and the risk of privacy disclosure is reduced; and in the third aspect, the intelligent service calculation unloading decision solution in the SDIoT is approximated to a target optimization model with multiple limiting conditions, a task unloading decision algorithm based on DDPG is designed, and the optimal service unloading strategy is realized.

The invention provides an Internet of things service unloading decision model based on SDN and DDPG, and the technical effect of at least one aspect of the model comprises the following steps: the method solves the inherent network structuring problem in the traditional IoT, and utilizes the SDN technology to separate a network control plane and a data forwarding plane, thereby obtaining a whole network centralized view, ensuring the network security performance and also obtaining the strong resource management and control arrangement capability brought by a programmable interface. By introducing an edge computing technology, the computing, storage and other capabilities of the cloud computing data center are moved to the edge of a network, an operation environment with low time delay and high bandwidth is provided for services, and the computing requirements of emerging applications such as smart cities, intelligent transportation and smart homes are met. Finally, the non-convex optimization problem with complex objective functions and constraint conditions is solved by utilizing the characteristic that the DDPG algorithm can directly learn the strategy from high-dimensional original data, and the service unloading strategy can be dynamically and efficiently made by the finally designed service unloading algorithm based on the DDPG. The model provided by the invention can reasonably utilize network resources in the Internet of things on the basis of meeting the requirement of minimizing time delay, and has excellent performance and good stability.

Drawings

Fig. 1 is a flowchart of an internet of things service offloading decision method based on edge computing and deep reinforcement learning in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an SDN-based internet of things architecture according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a DDPG algorithm-based service offloading decision model according to an embodiment of the present invention;

FIG. 4 is a graphical illustration of a comparison of the effect of maximum distance between adjacent regions on average data transmission rate in various embodiments of the invention;

FIG. 5 is a graph illustrating the effect of maximum distance between adjacent regions on average data transmission rate in various embodiments of the present invention;

FIG. 6 is a graphical illustration comparing the effect of the number of devices connected to each edge node on the average data rate in various embodiments of the invention;

FIG. 7 is a graphical representation of the effect of the number of devices connected to each edge node on the average data rate in various embodiments of the present invention;

FIG. 8 is a graphical illustration of a comparison of the effect of task count on task offload success rate in various embodiments of the invention;

FIG. 9 is a diagram illustrating a comparison of the effect of the number of devices on task execution time in various embodiments of the invention.

Detailed Description

It should be noted that the service offloading in the embodiment of the present invention is a computing offloading, which refers to allocating an intelligent service task with a large computation amount to one or more computing nodes with sufficient computing resources for processing, and then retrieving each partial computation result after completion of the computation from the computing nodes, and releasing the computing resources. In the prior art, the computing node is typically a proxy server. The Computing offloading technology is first applied to Mobile Cloud Computing (MCC), and in some Mobile Edge Computing (MEC), the decision of Computing offloading may have the following three schemes. 1. Local execution (local execution): the whole calculation is completed locally at the UE; 2. full unloading (full emptying): the entire computation is offloaded and processed by the MEC; 3. partial emptying: part of the computation is processed locally, while another part is offloaded to the MEC server for processing. For internet of things (IoT) systems, the above approach has no relevant technical implications for direct integration due to different emphasis points discussed.

The invention provides a plurality of internet of things service unloading decision methods based on edge calculation and deep reinforcement learning through the following embodiments, as shown in fig. 1, the method comprises the following steps S100 to S300:

s100, an SDN-based Internet of things architecture is established, and at least one layer of the Internet of things architecture is provided with a plurality of regional SDN controllers. Namely, an internet of things framework for managing areas is established, and a calculation unloading decision algorithm of one area is dynamically operated by at least one area SDN controller.

S200, establishing a service unloading problem model based on a task unloading mode; the model is used for providing a decision vector which can be used for processing analysis, and intelligent service calculation unloading decision solving in the SDIoT is approximated to be a target optimization model with multiple limiting conditions.

And S300, for the calculation unloading decision of the intelligent service in one area, automatically outputting by an SDN controller of the area configured with a calculation unloading decision algorithm. In particular, the computational offload decision algorithm is in some embodiments a service offload decision model configured at a regional SDN controller.

In a first embodiment, the service offloading problem model relates to a specified internet of things system, which is based on an SDN internet of things architecture, i.e. SDIoT architecture, as shown in fig. 2, and the architecture mainly includes five parts from top to bottom on a communication structure: a cloud service Master Layer (Master Control Layer), a regional SDN Control Layer (Cell SDNControl Layer), an Edge Node Layer (Edge Node Layer), a Data Layer (Data Layer), and a Device Layer.

Wherein,

the cloud service Master control layer comprises a network Server (WebServer), a DataBase System (DataBase System), an Application Server (Application Server) and an SDN Master Controller (SDN Master Controller), wherein the network Server (WebServer), the DataBase System (DataBase System), the Application Server (Application Server) and the SDN Master Controller are involved in the running of various applications by a cloud service platform. The cloud service platform can provide running application providing services, such as equipment detection, environment temperature monitoring, resource allocation systems and the like. The embodiment includes more than one SDN host controller having the following functions: the method is used for managing the regional SDN controller, and endowing the regional SDN controller with authentication access authority, and a Northbound Interface (Northbound Interface) of the regional SDN controller is connected with various applications of the cloud service platform.

And the regional SDN control layer comprises a plurality of SDN controllers (SDN controllers). In each embodiment of the present invention, a whole internet of things system should be subjected to regional management, each region (Cell) has at least one specified SDN Controller, that is, a regional SDN Controller (Cell SDN Controller), and, unless otherwise specified, the SDN controllers mentioned in the embodiments of the present invention are all regional SDN controllers. In this embodiment, one region includes a plurality of Edge computing nodes (Edge computing nodes), and one region SDN controller is mainly responsible for arrangement management of the Edge computing nodes in the region and Edge computing offloading decisions. In this embodiment, each SDN Controller dynamically runs a computation offload decision algorithm by acquiring a current resource usage state of each edge computing node in the region and a computation task demand from a Domain Controller (Domain Controller) in combination with a DDPG algorithm.

The edge Node layer comprises various edge computing nodes and one or more Cache nodes (Cache nodes). In a traditional cloud computing mode, a user needs to transmit data to a remote cloud center for computing, and for large-scale data, not only is transmission delay increased, but also the risk of data leakage is increased. The edge computing node of the embodiment is closer to the data source in the network relationship, and is preferably located near the data source in the geographic spatial distance, so that the transmission delay can be significantly reduced, a low-delay and high-stability service can be provided for a user, and the user experience is improved. Each cache node of the edge node layer is mainly used for caching high-frequency data specified or related to a class of high-frequency data requests, reducing repeated requests in a network, relieving bandwidth pressure and having foresight property.

And the data layer comprises a plurality of industrial switches (Switch), wireless Access Points (AP) and domain controllers. In this embodiment, the AP should simultaneously implement authentication of each access device and data transmission of a device layer, and the domain controller is mainly responsible for dynamically managing network traffic, uploading a calculation task requirement to an SDN controller corresponding to a regional SDN control layer, and performing a calculation offloading decision by the SDN controller.

The device layer includes various fields in the internet of things, such as various sensors included in Smart homes (Smart Home), Automatic factories (Automatic Factory), Industrial parks (Industrial Park), and intelligent transportation (intelligent transportation). The sensors in these fields generate a large amount of data during operation, and the sensors are mainly responsible for collecting the operating state information of the devices and the generated data information, and then transmitting the operating state information and the generated data information to a data layer for processing through Access Point (APs) authentication.

The whole SDIOT framework forms a multi-level working mode, and the calculation process can be accelerated. For the internet of things with such a structure, it is an important and arduous task that a device initiates a computation task to be operated locally or to offload computation to an edge computing node, and in the embodiment of the present invention, the task is completed by the domain controller and the regional SDN controller together according to the service offloading decision model obtained in step S200 and the DDPG algorithm provided in step S300.

In a second embodiment, the internet of things system includes an offload mode in which one involved computing task includes four allocation decisions, and the multiple computing tasks of this embodiment involve the following four offload modes: the method comprises the steps of local computing (local computing), cloud unloading (Offloading service to closed Server), Edge computing node unloading (Offloading service to Edge computing node) and idle device terminal unloading (Offloading service to idle device), wherein the local computing, the cloud computing, the Edge computing node and the idle device terminal are respectively corresponding to a local computing task, a cloud computing task, an Edge computing node task and an idle device terminal computing task. The cloud unloading and the edge computing node unloading are directed at tasks with high requirements on computing capacity; for tasks with higher real-time requirements, selecting to execute on a local execution or an idle device terminal adjacent to the local execution or the local execution is more beneficial to avoiding the execution time from being too long. One of the technical problems to be faced by this embodiment is that computation offloading of an intelligent service in IoT, that is, service offloading, often needs to comprehensively consider various problems such as transmission delay, computation delay, and device mobility, perform joint optimization, and make an optimal service offloading decision.

The present embodiment may or may not be based on the specific internet of things architecture of the first embodiment, but all should include more than one area unit (Cell) that is the same as or equivalent to the present embodiment. In this embodiment, a plurality of device terminals of one layer and edge computing nodes of one layer in the IoT system are preconfigured and divided into a plurality of area units (cells) for management, and the area units are set by area { Cell }₁,Cell₂,…Cell_NRepresents it. For one of the area cells Cell_iUse of sets

Representing individual equipment terminals therein, by sets

Represents Cell_iAll edge compute nodes within. In the IoT system of the present embodiment, one intelligent service regarded as one computing service is complex, and one intelligent service includes a plurality of different tasks, so that the present embodiment uses one computing service

Dividing the task into a plurality of tasks to respectively carry out unloading calculation, and assuming a device D at one moment_iA particular computing service of as a collection

Order to

Wherein the element K_pIs a task of the computing service. The present embodiment uses the action policy parameter a_D,X,K∈ {0,1} represents the decision result for the task Kp, and exemplary decision coefficients have the following truth or count definitions, when device D is in operation_iSelecting to execute task K locally_pWhen there is

When the task is selected to be executed in the cloud at the Remote end (Remote), there are

When the tasks are executed on the edge computing nodes

If the equipment selects the intelligent terminal D in the idle state_jIn the upper execution, then

Further, the embodiment assigns each task K_pStructures of formula (1) are defined.

K_p＝(Q_p,S_p,T_m) (1)

Wherein Q is_pIndicating completion of a computing task K_pThe total number of CPU cycles required, i.e. to complete task K_pThe amount of computing resources required; s_pIndicating an offload task K_pThe total amount of all data transmitted; t is_mIndicating completion of a computing task K_pThe maximum delay allowed, the task delay affecting the maximum delay in this embodiment mainly includes the delay associated with Q_iThe sum of the associated calculated delays and S_pThe associated communication delay.

In this embodiment, for the local computation task, the communication delay is considered to be zero, and for the cloud computation task, the edge computation node task, and the idle device terminal task, the transmission delay needs to be considered. Exemplarily, for one device D_iA calculation formula of a communication delay of a specific task of (1) can be expressed as follows:

wherein parameter B represents device D_iBandwidth, parameter of channel with X

Representing the transmit power of the device, the path loss between the device and the offload object X may be modeled

Representation, parameter

Is the distance between the device and the unloaded object and the parameter is the path loss factor.

Representing the channel fading factor, N, of the uplink link₀Representing gaussian white noise power.

Exemplarily, for device D, for four task computation modes_iTask K of_pTask delay T of_pAre respectively expressed as:

equipment D_iIts own computing power, i.e. the frequency of the CPU is

Then task K_pTime of execution locally

Comprises the following steps:

setting the computing power of a specific cloud as f_RThen task K_pExecution time on remote cloud

Comprises the following steps:

let the edge node computing power be

Then task K_pExecution time on edge node

Comprises the following steps:

setting the computing power of the idle equipment terminal as

Then task K_pExecution time on idle device terminal

Comprises the following steps:

equations (1) to (6) constitute the internet of things service offloading calculation model of the present embodiment. Based on the model, for a specific task K_pThe total task latency can be expressed as:

for device D_iA specific intelligent service of

Define its offload decision vector

Wherein the elements

To

An element of

Is provided with

The constraint conditions are as follows:

and,

for the edge calculation node, an edge calculation node M is set_iThe total amount of resources of

Idle device terminal D_jThe total amount of resources of

The remote cloud computing resources are abundant and therefore may not be considered.

The service offloading problem model of one intelligent service in the IoT system of the present embodiment can be described as follows.

In the formula (8), the step S200 is realized by solving an intelligent service under the constraint conditions of C1 to C5

The calculation resources of the uninstalled object can be distributed as reasonably as possible under the condition that the total amount of the resources of the uninstalled object is not exceeded, so that the number of the intelligent services which can be executed is maximized, and the solution target of the service uninstalling problem model is realized through the step S300.

In the third embodiment, the DDPG algorithm in deep reinforcement learning is used to solve the problem of solving a service uninstallation problem model obtained in step S200, so as to implement the method described in step S300.

In this embodiment, the entire IoT system is divided into a plurality of areas (cells), and one SDN controller is disposed in each area to obtain a scene view in the entire area of the area, so as to provide an offloading decision of an intelligent service and resource allocation of an offloading object in the entire area, and construct a service offloading problem model thereof. In this embodiment, the SDN controller includes a module configured with a reinforcement learning algorithm, and each smart device terminal has an optimal task offloading scheme through a preset service offloading decision model.

The basic elements of reinforcement learning include Agent (Agent), environment, state s, policyPi, act a, and report r. The Agent of the embodiment perceives the current state s from SDIoT of the environment_tSelecting action a according to policy π_tActing on SDIoT of the environment to get a reward r_tTransition to the next state s_t+1. In FIG. 3, state s corresponds to state, action a corresponds to action, report r refers to rework, and TS is the next state. The Agent in this embodiment represents an algorithm Agent, i.e., a service offload decision algorithm or a service offload decision model, configured and operated in the Cell SDN Controller, and is responsible for acquiring and processing the above information. The strategy is each function process involved in the operation process of the algorithm. The reinforcement learning process of this embodiment also introduces a value function V to solve the return r_tOnly the return of the current action and status can be reflected, and the problem of the influence on the future return cannot be reflected, and V contains the current return and the future estimated discount return (denoted by gamma). The deep reinforcement learning method utilizes a deep neural network to learn a strategy of a continuous Action space, parameterizes the strategy and outputs the strategy as an Action (Action). The step S300 in this embodiment may be implemented by various deepening learning algorithms, including a DDPG (Deep deterministic Policy Gradient) series algorithm, a DQN (Deep Q-Network) series algorithm, or a conventional rl (robustness learning) algorithm. It should be noted that, when a problem that machine learning solves a fixed environment model is involved, the difference between reinforcement learning and supervised learning and unsupervised learning is mainly no specific data, only an incentive signal and the incentive signal are not necessarily real-time, time series data are mainly studied instead of independently and identically distributed data, and the current behavior affects subsequent data, so that a person skilled in the art has no motivation to necessarily select a solution that a reinforcement learning direction is used for solving the machine learning.

One idea of the present invention is to build the service offload decision model of the present invention by building a special service offload problem model with agents that combine Actor-Critic and DDPG, which are Policy based (Policy based) and Value based (Value based) that focus on the optimal Policy and not the optimal Action per step.

Exemplarily, as shown in fig. 3, in the present embodiment, a service offload decision model suitable for the DDPG algorithm is preferably used to obtain a dynamically optimized service offload decision result by solving the service offload problem model obtained in S200, and the Actor-critical architecture of reinforcement learning is adopted, where the Actor-critical architecture includes four neural networks, namely, an Actor _ M Network, an Actor _ T Network, a critical _ M Network, and a critical _ T Network, where the Actor _ M Network and the Actor _ T Network are identical in structure, the critical _ M Network and the critical _ T Network are identical in structure, and form a Main Network for training and optimizing Network parameters (Policy), and the Actor _ T Network and the critical _ T Network form a Target Network for generating a training data set.

Based on the above neural network architecture, the present embodiment determines a specific state space, an action space, a reward function, a value function, a deep deterministic policy gradient and an experience pool in the service offloading decision model of the present embodiment in the following manner.

In a first aspect of this embodiment, a method for determining a State space for acquiring an environment State includes: the method mainly comprises the steps that equipment needing task unloading is taken as a center, the states of edge computing nodes and idle equipment terminals around the equipment mainly comprise the number of residual computing resources, and the periphery of the equipment is under the same domain controller. For the edge computing node, defining the residual resource amount of each node in the area as a set

Wherein

The residual resource amount of the idle equipment terminal is set as

Wherein

For an intelligent service

Defining the total time delay

The state space is expressed in this embodiment as equation (9):

the subscript t refers to a specific time, and the action states at different times dynamically change.

The determination method of the motion space of the second aspect of the present embodiment is as follows: within one area (Cell) of an IoT system, for one intelligent service

One specific service offloading decision is defined as a vector of the form (10):

wherein,

in the third aspect of this embodiment, a Reward function is obtained, and according to the description of the service uninstalling problem model obtained in step S200, the goal when performing service uninstallation is to make the service execution time perform

To a minimum, the goal of the DDPG algorithm is to maximize the desired reward after performing an action, so that the reward and execution time are negativeIn relation, the calculation formula defining the reward function is as follows:

wherein

Is an intelligent service

The time required for all local executions, s_tRepresenting the state at time t in the state space, a_tThe action taken at the moment t in the action space is represented, and specifically, the action is a determined action strategy parameter.

The fourth aspect value function of the present embodiment is obtained in the following manner. It is noted that the prior art value function is generally defined as V_π(S) for evaluating the merit of a certain state and action by evaluating the expected reward of the agent in the state S selection policy pi, the class value function being defined as follows:

wherein

Which is indicative of a desired operation to be performed,

representing the reward obtained by the initial state s following the policy pi.

In this embodiment, the value function is a value function of action a, which is defined as Q_π(s, a), representing the total reward obtained by the agent when the state s takes the action a following the policy π, which is defined specifically as follows:

where r (s, a) represents the reward expected to be earned by taking action a at state s, γ represents a decay factor, which ranges from 0 to 1, and s' represents the next state reached by taking action a at state s.

Further, according to the Bellman equation, the action value function in the DDPG algorithm applicable to this embodiment can be obtained as follows:

Q^μ(s_t,a_t)＝E[r(s_t,a_t)+γQ^μ(s_t+1,μ(s_t+1))](13)

where μ denotes a policy generated by the Actor _ M network.

The deep deterministic policy gradient of the fifth aspect of the present embodiment is a distributed deep deterministic policy gradient. The Policy Gradient (Policy Gradient) of the embodiment is a Policy search technique, and is an optimization algorithm based on Gradient. It aims to model and optimize policies (Policy) to search directly for the best behavior Policy of an Agent (Agent). For the critical _ M network and the Actor _ M network in the DDPG algorithm of this embodiment, a gradient update method is adopted to update the parameters

And

for Critic _ T network and Actor _ T network, the parameter is updated by software update

And

when updating the critical _ M network, a Loss Function (Loss Function) needs to be calculated first, and the calculation formula is as follows:

wherein,

depending on the critic _ T and actor _ T networks being learned, it can be regarded as a "Tag", which is calculated as follows:

the formula for the gradient calculation is as follows:

wherein, the function J is used for measuring the performance of the strategy mu, and the calculation formula is as follows:

ρ^βthe distribution function representing the state s, i.e. J (μ) is at s according to ρ^βWhen distributed, to Q^μ(s, μ (s)) to solve for the expected value. i refers to the number of a particular region.

The sixth aspect of this embodiment includes an experience pool (playback buffer), and this embodiment introduces the concept of playback buffer into the DDPG algorithm, and the main effect is that when an Actor network interacts with an environment, the generated transition data sequences are highly correlated in time, and if these data sequences are directly used for training, an override of the neural network is caused, and convergence is not easy. Therefore, the Actor of the DDPG stores the transition data into the experience replay buffer, and then randomly samples the mini-batch data from the experience replay buffer during training, so that the sampled data can be regarded as unrelated. In an embodiment of the present invention, which implements Agent modeling in step S300 based on DQN algorithm, an experience pool is also included.

The service offloading decision model of this embodiment designs a distributed deep deterministic policy gradient based on the DDPG algorithm every time a deep reinforcement learning loop, and uses this algorithm to process an SDN-based IoT service offloading decision problem, which is described by the service offloading decision problem model provided in step S200.

In the internet of things system of this embodiment, N areas (cells) described in step S100 are included, and in combination with the details of the service offload problem decision model algorithm provided in fig. 3, the following algorithm 1 described in a pseudo code manner may be provided to express the processing procedure of the model.

Wherein, the variable t in the algorithm 1 corresponds to the subscript t of the state space variable of equation (9); the variable i is identical in meaning to the index i in the formulae (14), (15), (16), (17), i.e. the region numbers, and serves to distinguish the different regions. Equations (18) and (19) are fixed equations for network parameter update.

In the fourth embodiment, three specific examples are provided to specifically illustrate the unobvious effects of the technical solution of the present invention. The first specific embodiment adopts the DDPG algorithm provided by the third embodiment to obtain the decision result of the service offload model, the second specific embodiment adopts the DQN algorithm to obtain the decision result of the service offload model, and the third specific embodiment adopts the semi-reinforcement learning algorithm to obtain the decision result of the service offload model. The present embodiment is based on the same parameter or device basis through each specific embodiment, and the presentation effect of the comparison is based on simulation or actual internet of things system test.

In this embodiment, a layered multi-region internet of things system is first established through step S100, and the system is divided into 25 regions, where the minimum distance l between two adjacent regions_min10m, the maximum distance varies between 100m and 1400 m. 2 edge computing nodes are arranged in each area, and the computing frequency is respectively 650 and 600; 2 idle intelligent device terminals, the calculating frequency is 200 and 150 respectively. The maximum transmission power of the device is 38dbm and the minimum transmission power is 5 dbm. Then random100 computing tasks are generated, with the computing resources required for the tasks varying from 100 to 800. The specific simulation parameters are shown in table 1.

TABLE 1 Simulation parameter

One aspect of the present embodiment shows the effect of the maximum distance between the regions in SDIoT on the average transmission rate of data, i.e., when l is considered_maxRegarding the variation range as variable, the overall variation trend of the average data transmission rate obtained by the algorithm of the three embodiments shown in fig. 4 is shown by the three embodiments when the variation range is 0.1km to 1.4km, and it can be seen that as the distance of each region gradually increases, the interference between the regions decreases, resulting in the increase of the average data transmission rate in the SDIoT. In the three algorithms, the data span degree of the RL algorithm is larger, the change amplitude of the DDPG algorithm is reduced, and the change trend is the most stable. In FIG. 5, it can be seen more intuitively that in particular_maxNext, the results of the three algorithms are compared, and it can be seen that regardless of l_maxIn terms of the magnitude, the results obtained by the DDPG algorithm of the third embodiment are the largest, while the results obtained by the RL algorithm are the smallest, and the DQN algorithm is more stable, but the obtained results are smaller than the DDPG algorithm, and the performance of the DDPG algorithm provided by the third embodiment is better than that of the other two algorithms.

In the IoT system established according to the first embodiment, one aspect of the present embodiment is that a plurality of smart device terminals of the device layer have mobility, i.e., the total number of devices per area is varied. Specifically, the service offloading problem model of the present embodiment regards the number of devices connected to each compute node as a variable, and the variation range is from 1 to 8. As shown in fig. 6 and 7, an aspect of the present embodiment shows that as the device density increases, the average data transmission rate decreases, that is, the average value of the data transmission delay in the service offloading problem model decreases, and all three specific embodiments of the three algorithms exhibit similar variation trends, the variation magnitudes are close, but the DDPG algorithm provided by the third embodiment has a higher data transmission rate.

An aspect of this embodiment shows that in the SDIoT system as provided in the first embodiment, there is a maximum allowable delay T for each task that needs to be offloaded_mIf this threshold is exceeded during the offloading process, the task offloading is considered unsuccessful, and therefore the inventive solution provides a set of outputs on the offloading success rate of the task, which can be used for self-evaluation of various embodiments of the invention. Based on the specific parameters of this embodiment, as shown in fig. 8, when the number of tasks to be unloaded is taken as a variable, the variation range is 10 to 100, and in three specific embodiments corresponding to the three algorithms, the task unloading success rates are all decreased with the increase of the number of tasks, wherein the decrease range of the task unloading success rate of the DDPG algorithm of the first specific embodiment is smaller, and the values are all greater than those of the other two algorithms, which indicates that the performance of the DDPG algorithm is better.

One aspect of the present embodiment shows the technical contribution of the present invention to the comparison of the average execution time of tasks in intelligent services. In this embodiment, the number of devices in each SDIoT area is taken as a variable, the variation range is [25,36,49,64,81,100], fig. 9 shows comparison of the effects of the three embodiments, where the variation range of the task execution time obtained by the RL algorithm is large, the task execution delay is maximum, the variation trend of the results obtained by DDPG and DQN is stable, the task execution time gradually increases with the increase of the number of devices in the DDPG algorithm, and the results obtained in the three algorithms are optimal, and compared, the performance of the DDPG algorithm is better in the three algorithms.

Claims

1. An Internet of things service unloading decision method based on edge calculation and deep reinforcement learning is characterized in that the Internet of things is constructed into SDIoT comprising a plurality of areas, each area comprises an area SDN controller configured with a service unloading decision model, and the area SDN controller outputs an intelligent service in the area according to the service unloading decision model configured by the area SDN controller

Service offload decision.

2. The internet of things service offloading decision method of claim 1, wherein: the service unloading problem of the service unloading decision model is that an intelligent service is solved

The method has the advantages that the minimum execution time delay is realized, and under the condition that the total amount of the unloaded object resources is not exceeded, the calculation resources of the unloaded object are distributed, so that the number of intelligent services which can be simultaneously executed by the Internet of things is maximized.

3. The internet of things service offloading decision method of claim 2, wherein: and solving the service unloading problem by adopting a deep reinforcement learning algorithm in a service unloading decision model.

4. The internet of things service offloading decision method of claim 3, wherein: the deep reinforcement learning algorithm is a DDPG algorithm focusing on an optimal strategy and focusing on the sum of optimal rewards.

5. The internet of things service offloading decision method of claim 4, wherein: the DDPG algorithm is provided with an experience pool.

6. The IOT service offloading decision method of claim 4, wherein the value function of the DDPG algorithm is set to represent a total reward obtained by the agent taking action a while state s follows policy π, and the action α is a smart service

A task K of_pA decision result of (1).

7. The internet of things service offloading decision method of claim 1, wherein: the reward function of the DDPG algorithm is expressed as the difference between the time required for an intelligent service to execute entirely locally and the service execution time of the service under a decision.

8. An Internet of things service unloading decision method based on edge calculation and deep reinforcement learning is characterized by comprising the following steps:

s100, establishing an SDN-based Internet of things architecture, wherein at least one layer of the SDN-based Internet of things architecture is provided with a plurality of regional SDN controllers, and a calculation unloading decision algorithm of a region is dynamically operated by at least one regional SDN controller of the region;

9. The internet of things service offloading decision method of claim 8, wherein: the Internet of things architecture comprises a cloud service main control layer, a regional SDN control layer, an edge node layer, a data layer and an equipment layer; the calculation unloading decision algorithm is a deep reinforcement learning algorithm.

10. The internet of things service offloading decision method of claim 8, wherein: the computational offload decision algorithm is also used for computational resource allocation for the region.