CN117202265A

CN117202265A - DQN-based service migration method in edge environment

Info

Publication number: CN117202265A
Application number: CN202311287891.9A
Authority: CN
Inventors: 陈哲毅; 黄思进; 刘浩; 文佳; 曾旺
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2023-10-08
Filing date: 2023-10-08
Publication date: 2023-12-08

Abstract

The invention provides a service migration method based on DQN in an edge environment. To explore optimal service migration policies. The proposed DQNSM method achieves faster convergence speed and better convergence effect by setting an experience playback pool and a target network. Based on the urban actual vehicle track data set, a large number of simulation experiments verify the feasibility and effectiveness of the DQNSM method provided by the invention. Compared with 4 reference methods, the DQNSM can quickly make service migration decisions, and can obtain the best effect under different MEC environment parameter settings.

Description

DQN-based service migration method in edge environment

Technical Field

The invention belongs to the technical field of 5G communication networks, mobile edge computing and service migration, and particularly relates to a DQN-based service migration method in an edge environment.

Background

With the rapid development of 5G communication technology and the popularization of mobile devices, a great deal of intelligent applications are continuously emerging. Such intelligent applications are often accompanied by computationally intensive tasks to support their high quality intelligent services, which presents a significant challenge to the processor performance of the mobile device. Cloud computing is considered an efficient, economical, and scalable computing paradigm by deploying server clusters at remote data centers to provide rich computing and storage resources. Therefore, when the problem of limited performance of the mobile device is faced, the computation-intensive task generated by the mobile device can be offloaded to the remote cloud, so that abundant computing resources in the cloud are accessed and obtained, and the computing capacity of the mobile device is supplemented. However, a long transmission distance between the mobile device and the cloud data center will cause the task response to be delayed too long, thereby causing mobile traffic to be stagnant. Second, cloud computing requires user data to be stored in a remote cloud, which would expose the user to security issues.

Compared to cloud computing, mobile edge computing (Mobile Edge Computing, MEC) deploys computing and storage resources at the network edge, providing a low latency, high bandwidth, and high reliability service. The mobile device may utilize the MEC for computational offloading, transmitting the computationally intensive tasks to the MEC server for processing. Because the MEC server is closer to the user than the cloud server, the data transmission is faster and more stable, and the task response delay is significantly reduced. Meanwhile, the user data does not need to be uploaded to the remote cloud for processing, so that the safety of the user data is guaranteed to a certain extent. Furthermore, virtualization is considered one of the key technologies for resource management and allocation in MECs. When a user selects a particular MEC server for computing offloading, the MEC server will create service instances for the user using virtualization techniques and allocate corresponding computing resources for them, providing computing services for the user, and these service instances will execute in parallel on a shared hardware platform.

Considering mobility of users and limited communication coverage of edge servers, it is difficult for a single MEC server to provide uninterrupted computing services for a certain user. Meanwhile, due to limited computing resources of the MEC server, a single MEC server is difficult to bear large-scale task computing, which definitely seriously affects QoS. In order to solve the above-mentioned problems, it is generally considered to migrate service instances (including information such as runtime data and user state context) provided to users to new MEC servers according to mobility of the users, so as to achieve more efficient resource utilization and collaboration and ensure QoS of the users.

Existing research efforts on service migration are mostly based on mathematical optimization or heuristic methods. These methods typically require system prior knowledge to formulate corresponding service migration policies, but may be difficult to obtain in practically dynamic and diverse edge systems, and their performance is susceptible to environmental changes, requiring adaptation to new environments by continually adjusting model parameters. Furthermore, these methods typically require a large number of iterative computations or searches in a large solution space, which results in a high computational overhead. Thus, in facing the problem of service migration in dynamic edge systems, there is a need to explore more efficient solutions to achieve efficient adaptive service migration. Reinforcement learning (Reinforcement Learning, RL) is a promising decision-making method in which an agent can gradually adjust strategies through interactive learning with the environment to obtain a maximized jackpot in the environment. However, if the conventional reinforcement learning method is applied to the service migration problem in the edge environment, the problem of the high-dimensional state space is faced.

Disclosure of Invention

Mobile edge computing (Mobile Edge Computing) provides a low latency, high bandwidth service by deploying computing and storage resources at the network edge. The mobile device can offload computing tasks to the MEC server for processing to meet the real-time requirements of intelligent applications. Due to the mobility of the users and the limited communication coverage of the MEC servers, it is difficult for a single MEC server to provide uninterrupted service to one user, which will seriously affect the quality of service (Quality of Service, qoS). Thus, the service instance of the user needs to dynamically migrate as the user moves to better guarantee QoS. The existing solutions generally rely on system prior knowledge, are difficult to effectively solve the service migration problem in the dynamic edge environment and are easy to cause excessive iterative computation overhead. To solve the above-mentioned important challenges, the present invention proposes a novel Deep-Q-Network-based Service Migration (DQNSM) based service migration technique to explore an optimal service migration strategy. The proposed DQNSM method achieves faster convergence speed and better convergence effect by setting an experience playback pool and a target network. Based on the urban actual vehicle track data set, a large number of simulation experiments verify the feasibility and effectiveness of the DQNSM method provided by the invention. Compared with 4 reference methods, the DQNSM can quickly make service migration decisions, and can obtain the best effect under different MEC environment parameter settings.

To solve this problem, the present invention proposes a service migration method (Deep QNetworkbased Service Migration, DQNSM) based on a deep Q network to obtain an optimal service migration policy in an edge environment. Meanwhile, the DQNSM method utilizes deep neural networks (Deep Neural Networks, DNNs) to address the high dimensional state space problems faced by conventional RL methods. Further, by setting up an empirical playback pool and a target network, the DQNSM method can achieve faster convergence speed and better convergence effect.

The invention designs a unified service migration model aiming at a complex dynamic MEC system, takes long-term QoS as an optimization target, and measures delay in aspects of migration, communication, calculation and the like. Accordingly, based on the deep reinforcement learning (Deep Reinforcement Learning, DRL) framework, the state space, action space and rewards functions of service migration problems in the MEC environment are defined and formalized as markov decision process (Markov Decision Process, MDP). And then a service migration method (DQNSM) based on DQN is provided, in the DQNSM method, an experience playback pool and a target network are adopted, and therefore a faster convergence speed and a better convergence effect can be achieved. Based on the actual vehicle track data set of the city, the DQNSM method can achieve the best performance under the setting of different MEC environment parameters.

The technical scheme adopted for solving the technical problems is as follows:

the service migration method based on the DQN in the edge environment is characterized in that long-term QoS is taken as an optimization target, a state space, an action space and a reward function of a service migration problem in the MEC environment are defined based on a deep reinforcement learning framework, and the optimization problem is formalized and expressed as a Markov decision process; and solving the optimization problem by using a service migration method DQNSM based on DQN, and adopting an experience playback pool and a target network to achieve faster convergence speed and better convergence effect.

Further, the MEC system is configured by M MEC servers, U mobile users, and one MEC controller, where M MEC servers are denoted as a set m= {1,2,..m., M }, and U mobile users are denoted as a set u= {1,2,..u, u., U }; the mobile user u sends a task processing request to a service instance of the mobile user u through a 5G network, the MEC server is responsible for receiving and forwarding the task request to the corresponding service instance, and the MEC controller is responsible for collecting and executing service migration decisions of all mobile users;

the MEC system operates in discrete time slots, the location of mobile user u changing at the beginning of each time slot T e {0,1,2,., T }; for the running needs of intelligent applications, a computationally intensive Task is generated at each time slot t _t (u)；

At time slot t ₀ The equipment of the mobile user u accesses the MEC system, and creates a service instance on the MEC server nearest to the equipment, and the service instance continuously provides computing service for the mobile user u in the later time slot; after receiving the task request, the service instance on the MEC server processes the task and returns the result to the user equipment; regardless of whether the task data is sent or the data results are received, the mobile user u's device always communicates with its nearest MEC server; the MEC servers are interconnected by a stable backhaul link;

the MEC server to which the mobile subscriber u is connected is denoted as time slot tThe MEC server where the service instance is located is denoted +.>How to reduce long-term system delay and guarantee QoS by selecting proper service migration time nodes and destinations, and taking the delay as a measure index of QoS to define and form corresponding optimization problems.

Further, in the migration model,defined as the MEC server where the mobile user u service instance is located in time slot t-1, and a _t (u) ∈M is defined as the service migration decision of mobile user u at time slot t, i.e. at time slot t, the service instance corresponding to mobile user u will be according to a _t (u) migration to MEC Server +.>Use d _t (u) measure->And->A jump distance between the two; if->I.e. d _t When (u) =0, no migration is required at this time; otherwise, it is necessary to add its service instance from MEC server +.>Move to->And generating a corresponding migration delay; migration delay is a term of d _t A monotonically non-decreasing function of (u); the service migration delay of mobile user u in time slot t is defined as follows：

Wherein S is _t (u) represents the data size of the service instance corresponding to mobile user u, η represents the network bandwidth, σ ^m A migration delay coefficient representing a unit hop distance;

after completing the service migration, the device of mobile user u will Task _t (u) offloading to a corresponding service instance for processing and generating a corresponding communication delay; communication delay by mobile subscriber u's local equipment and MEC serverWireless access delay between +.>And->Backhaul link transmission delay between:

first, consider the local equipment of mobile user u and the MEC serverA wireless access delay therebetween; in time slot t, the local device of mobile user u is associated with +.>The signal to noise ratio between is defined as follows:

wherein P is _u Representing the transmit power of mobile user u's local device,representing mobile user u bookGround facility and->Channel gain, sigma between ² Representing the gaussian noise average power; and channel gain +.>Regarding the distance between the two, formalized definition is as follows:

where α represents the channel gain per unit distance, anRepresenting mobile user u local device and +.>The actual distance between them; defining the total bandwidth of the MEC server as B, and adopting an OFDM technology to distribute the bandwidth to user equipment in the area range in an orthogonal mode in each time slot t; the wireless uplink transmission link rate of the local device of mobile user u at time slot t is defined as follows:

wherein B is _t (u) represents the available bandwidth of the mobile user u's local device at time slot t;

thus, mobile user u's home device andthe radio access delay between is defined as follows:

wherein D is _t (u) represents the data size of the offload tasks;

when (when)Task data D _t (u) a backhaul link between MEC servers is required to transmit, at which point a corresponding backhaul link transmission delay will occur: depending on D _t The data size of (u) and the jump distance between them, using y _t (u) measuring the jump distance between the two; />And->The backhaul link transmission delay between them is defined as follows:

wherein D is _t (u) represents the data size of the offload tasks, η represents the network bandwidth of the backhaul link, σ ^bh A transmission delay coefficient representing a unit hop distance;

thus, the communication delay of mobile user u at time slot t is defined as follows:

HT _t (u)＝PT _t (u)+ST _t (u). (7)

in the time slot t, the equipment of the mobile user u unloads the task to the corresponding service instance for processing, and the MEC server distributes computing resources for the service instance of the user by adopting a resource virtualization technology to support parallel operation of the service instance; computing Task _t The number of CPU cycles required for (u) is defined as follows:

C _t (u)＝D _t (u)K _t (u), (8)

wherein K is _t (u) represents the computation density of the task;

reuse is carried outMeasurement target server->The total load at time slot t is represented by F as the computing power of the MEC server; adopting a weighted resource allocation strategy, namely, the computing resource allocated by the service instance is in direct proportion to the number of CPU cycles required by the task of the time slot; the calculation delay of mobile user u in time slot t is defined as follows:

further, based on the system model definition, the goal of the proposed MEC system in the time horizon T is to minimize long-term system delays, including the sum of migration delays, communication delays, and computation delays for all mobile users; formalizing the optimization problem P1 is as follows:

aiming at the optimization problem P1, adopting DQNSM to minimize long-term system delay in the MEC system; regarding the MEC system as an environment, and enabling the DRL agent to select corresponding actions by interacting with the environment;

Wherein the state space, action space and reward function are defined as follows;

state space: the state space contains the load situation V of all MEC servers in the time slot t region _t Location Loc of mobile user u _t (u) Task generated by intelligent application represented by a two-dimensional vector _t Task data volume D of (u) _t (u) and its calculated Density K _t (u) corresponding service instance data size S _t (u) MEC Server at time slot t-1Information constitution; thus, the state space at time slot t is expressed as:

action space: in the time slot t, the service instance corresponding to the mobile user u is allowed to be migrated to any MEC server in the area; therefore, the action space at time slot t is expressed as:

a _t ＝{1,2,...,m,...,M}. (12)

bonus function: the optimization objective is considered to minimize long-term system delays; thus, at time slot t, the instantaneous prize of the system is inversely related to the sum of the migration delay, the communication delay and the calculation delay of the user; the bonus function is defined as follows:

r _t ＝-(MT _t (u)+HT _t (u)+CT _t (u)). (13)

in the service migration optimization process of the MEC system, DRL intelligent agent deployed on mobile user u equipment is in the current system state s according to the learned strategy _t Next select an action a _t The MEC controller collects service migration decisions of all users and executes service migration; the environment will feed back instant rewards r _t And from state s _t Transition to state s _t+1 ；

The DQNSM method comprises the following steps:

based on the state space in equation (11), the action space in equation (12), and the definition of the winning function in equation (13), first initializeNetwork and network parameters thereof>Then, initialize target->Network, and will->Network parameters of the network->Copy to target->A network; meanwhile, initializing an experience playback pool R, training the number E of rounds, and a maximum time slot T of each round; by setting an experience playback pool R, the correlation of training samples is broken, and instability in the training process is relieved; in each training round, mobile user u, after accessing the MEC system, creates a corresponding service instance on its nearest MEC server, and, at each time slot t thereafter, enters the system state s _t Input->Network and is->Network generated service migration decision a _t The method comprises the steps of carrying out a first treatment on the surface of the Calculating instant prize r according to equation (13) _t And transitions to the next state s _t+1 The method comprises the steps of carrying out a first treatment on the surface of the The current empirical sample (s _t ,a _t ,r _t ,s _t+1 ) Storing R, and randomly taking out N samples from the R when the number of the samples stored in the R reaches N for training network parameters; calculating the next state action value by utilizing N samples sampled randomly, and comprehensively considering the current instant rewards r _t Fitting future rewards y _t The method comprises the steps of carrying out a first treatment on the surface of the Definitions->The loss function of the network and minimizing the mean square error loss using Adam gradient descent algorithm, the purpose of which is to let +.>The output of the network approximates y _t I.e. updated->The network is better able to fit Q (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the When->When the number of updates of the network reaches lambda, it will +.>Parameters of the network are copied to the target->A network.

The use process of the final forming scheme is as follows:

(1) The DRL intelligent agent generates service migration decisions according to the load conditions of all MEC servers, the positions of mobile users, the data volume of tasks generated by intelligent application and the calculation density of the tasks, the data size of corresponding service instances, the MEC servers where the last time slot is located and other information.

(2) And the MEC controller collects service migration decisions of all users, and issues service migration commands to each MEC server to complete migration of service instances.

(3) The MEC server allocates computing resources to service instances running on existing machines to support their parallel operation based on weighted resource allocation policies.

(4) And the MEC server receives or forwards the task request to the service instance corresponding to the user for processing according to the result of the service migration decision.

(5) In the service migration process, the state of each time slot, the action taken, the obtained rewards and the new state of the transfer are recorded, and the DRL intelligent agent generates a corresponding service migration decision according to the information.

Compared with the prior art, the DQNSM method provided by the invention and the optimized scheme thereof aims at the service migration problem in the MEC environment, and can generate a proper service migration decision according to the MEC network environment so as to maximize the system performance and ensure the QoS of the user. Meanwhile, the DQNSM method breaks through Markov correlation among data samples by setting an experience playback pool, and further, relieves the network oscillation problem in the training process by setting a target network. A large number of simulation experiments verify the feasibility and effectiveness of the DQNSM method, which can quickly make service migration decisions compared with other reference methods (AM, NM, PM, greedy). Under different MEC environment parameter settings (bandwidth of MEC server, computing power of MEC service, unit jump distance migration coefficient and number of users), the DQNSM method can obtain the best performance, and QoS of users is well guaranteed.

Drawings

The invention is described in further detail below with reference to the attached drawings and detailed description:

fig. 1 is a diagram illustrating service migration in the MEC environment according to an embodiment of the present invention:

fig. 2 is an overview of the DQNSM method proposed by an embodiment of the invention:

FIG. 3 is a comparison of the present invention for different discount factors:

Fig. 4 is a diagram showing the effect of network bandwidth on different methods according to an embodiment of the present application:

fig. 5 is a diagram showing the effect of MEC server calculation frequency on different methods according to an embodiment of the present application:

FIG. 6 is a graph showing the effect of unit jump migration coefficients on different methods according to an embodiment of the present application:

FIG. 7 is a graph showing the effect of the number of users on different methods according to an embodiment of the present application.

Detailed Description

In order to make the features and advantages of the present patent more comprehensible, embodiments accompanied with figures are described in detail below:

it should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

As shown in fig. 1, the proposed MEC system consists of M MEC servers, U mobile subscribers and one MEC controller, wherein, the M MEC servers are denoted as the set m= {1,2, the term M, the first and second parameters, M, the number, M. The mobile user u sends a task processing request to its service instance through the 5G network, the MEC server is responsible for receiving and forwarding the task request to its corresponding service instance, and the MEC controller is responsible for collecting and executing the service migration decisions of all mobile users.

Specifically, the MEC system operates in discrete time slots, with the location of mobile user u being determined at each time slot T e 0,1,2, the starting time of T is changed. For intelligent applications running needs (e.g. autopilot, object detection, path planning, etc.), it generates a computationally intensive Task at each time slot t _t (u) to better serve the user. Because of the limited computing power of the mobile user u's local equipment, these task requests need to be continuously offloaded to the MEC server for processing. At time slot t ₀ The device of mobile user u accesses the MEC system and creates a service instance on its nearest MEC server, which in the following time slots continues to provide computing services for mobile user u. And after receiving the task request, the service instance on the MEC server processes the task and returns the result to the user equipment. Regardless of whether the task data is sent or the data results are received, the mobile user u's device is always in communication with its nearest MEC server. The MEC servers are interconnected by a stable backhaul link. Therefore, when the MEC server where the service instance corresponding to the mobile subscriber u is located is not the same as the MEC server connected to the current mobile subscriber u, it may still access its service instance through multi-hop communication between the servers. Each MEC server has limited computational power and requires consideration of multiple factors simultaneously to determine whether to migrate the service.

The invention records the time slot t and the MEC server connected with the mobile user u asAnd the MEC server where its service instance is located is denoted +.>As shown in figure 1 of the drawings,when the user moves to a new location, now away from the original MEC server, is within the communication coverage of the new MEC server. Without service migration, task data would be transmitted in the backhaul link, which would increase task transmission delay. This delay can be avoided if the mobile subscriber's service instance is migrated to the nearest MEC server. But the migration of service data also introduces additional migration delays. Thus, how to reduce long-term system delay and guarantee QoS by selecting appropriate service migration time nodes and destinations is still a challenging challenge. In order to solve the problems, the invention models the service migration problem in the MEC system, takes delay as a QoS measurement index and defines the corresponding optimization problem.

1 migration model

Defined as the MEC server where the mobile user u service instance is located in time slot t-1, and a _t (u) ∈M is defined as the service migration decision of mobile user u at time slot t, i.e. at time slot t, the service instance corresponding to mobile user u will be according to a _t (u) migration to MEC Server +.>The invention uses d _t (u) measure->And->A jump distance between them. If it isI.e. d _t When (u) =0, no migration is required at this time; otherwise, its service instance needs to be moved from MEC serverMove to->And a corresponding migration delay is generated. Typically, the migration delay is caused by service interruption during migration and increases with increasing service data and hop distance, including service data in->Transmission delay on the backhaul link, queuing delay at the intermediate MEC server and at +.>Is a processing delay of (a). Thus, migration delay is a term for d _t A monotonically non-decreasing function of (u). The service migration delay of mobile user u at time slot t is defined as follows:

wherein S is _t (u) represents the data size of the service instance corresponding to mobile user u, η represents the network bandwidth, σ ^m Migration delay coefficient (i.e., S) _t (u) the time it takes to migrate a hop).

2 communication model

After completing the service migration, the device of mobile user u will Task _t (u) offloading to the corresponding service instance for processing, which will result in a corresponding communication delay. The communication delay is delayed by the local equipment of the mobile subscriber u and the MEC serverWireless access delay between +. >And->Backhaul link transmission betweenDelay of transmission.

First, consider the local equipment of mobile user u and the MEC serverWireless access delay between. In time slot t, the local device of mobile user u is associated with +.>The signal to noise ratio between is defined as follows:

wherein P is _u Representing the transmit power of mobile user u's local device,representing mobile user u local device and +.>Channel gain, sigma between ² The average power of gaussian noise is shown. And channel gain +.>Regarding the distance between the two, formally defined as follows:

where α represents the channel gain per unit distance, anRepresenting mobile user u local device and +.>Actual distance between them. The invention defines the total bandwidth of MEC server as B and adopts orthogonal frequencyA division multiplexing multiple access (Orthogonal Frequency Division Multiplexing, OFDM) technique, allocates its bandwidth in an orthogonal manner to user equipment within a range of the area at each time slot t. Thus, the wireless uplink transmission link rate of the local device of mobile user u at time slot t is defined as follows:

wherein B is _t (u) represents the available bandwidth of the mobile user u's local device at time slot t.

Wherein D is _t (u) represents the data size of the offload tasks.

In particular, whenTask data D _t (u) a backhaul link between MEC servers is required for transmission, at which point a corresponding backhaul link transmission delay will occur. The delay is mainly dependent on D _t The data size of (u) and the jump distance between them are similar to equation (1), using y _t (u) measuring the jump distance between the two. Thus (S)>And->The backhaul link transmission delay between them is defined as follows:

wherein D is _t (u) represents the data size of the offload tasks, η represents the network bandwidth of the backhaul link, σ ^bh A transmission delay coefficient (time taken to transmit one hop) representing the distance per hop.

HT _t (u)＝PT _t (u)+ST _t (u). (7)

3 calculation model

In time slot t, the device of mobile user u offloads the task to its corresponding service instance for processing, and the MEC server will allocate computing resources for the user's service instance using resource virtualization technology to support its parallel operation. Computing Task _t The number of CPU cycles required for (u) is defined as follows:

C _t (u)＝D _t (u)K _t (u), (8)

wherein K is _t (u) represents the computation density of the task (i.e., the number of CPU cycles required to compute per bit of data).

In addition, the invention uses To measure the goal server +.>The total load at time slot t is denoted by F as the computational capacity (i.e. the computational frequency) of the MEC server. Preferably, a weighted resource allocation strategy is employed, i.e. the computing resources allocated by a service instance are proportional to the number of CPU cycles required for the task of this slot. Thus, the calculation delay of mobile user u at time slot t is defined as follows:

4 optimization goals

Based on the above system model definition, the goal of the proposed MEC system in the time horizon T is to minimize long-term system delays, including the sum of migration delays, communication delays, and computation delays for all mobile users. The optimization problem P1 can be formalized as follows:

with the movement of the mobile user u, if no service migration is performed, the tasks sent by the user equipment in each time slot t need to be transmitted through the backhaul link between the MEC servers, which causes a high amount of communication delay. And if the service frequently follows the mobile user u to migrate to the currently connected edge serverThis would lead to a high service migration delay. Therefore, by the DQNSM method provided by the invention, the service instance can be migrated to the proper MEC server at the proper time point, so that the QoS of the user can be better ensured.

For the optimization problem P1 (equation 10), the present embodiment proposes a DQN-based service migration method (DQNSM) to minimize long-term system delay in MEC systems. As shown in fig. 2, the proposed MEC system is regarded as an environment, and the DRL agent selects the corresponding action by interacting with the environment.

Accordingly, the state space, action space, and reward function are defined as follows.

State space: the state space contains the load situation V of all MEC servers in the time slot t region _t Location Loc of mobile user u _t (u) (represented by a two-dimensional vector) Task created by the smart application _t Task data volume D of (u) _t (u) and its calculated Density K _t (u) corresponding service instance data size S _t (u) MEC Server at time slot t-1And the like. Thus, the state at time slot tThe space can be expressed as:

action space: at time slot t, the service instance corresponding to mobile user u may be migrated to any one of the MEC servers within the area. Thus, the action space at time slot t can be expressed as:

a _t ＝{1,2,...,m,...,M}. (12)

bonus function: the optimization objective of the present invention is to minimize long-term system delays. Thus, at time slot t, the instantaneous prize of the system is inversely related to the sum of the migration delay, the communication delay and the computation delay of the user. Thus, the bonus function is defined as follows:

r _t ＝-(MT _t (u)+HT _t (u)+CT _t (u)). (13)

In the service migration optimization process of the proposed MEC system, DRL agent deployed on mobile user u equipment will be in current system state s according to learned strategy _t Next select an action a _t The MEC controller will collect the service migration decisions for all users and perform the service migration. The environment will feed back instant rewards r _t And from state s _t Transition to state s _t+1 This process is denoted as MDP process.

The DQNSM can effectively approximate to a service migration optimal strategy in a dynamic MEC environment, and can obtain the best performance under the setting of different MEC environment parameters, so that the QoS of a user is ensured. The DQN is utilized in the proposed DQNSM method to obtain an optimal service migration policy. In the DQNSM method, an empirical playback pool is set to alleviate the correlation of data samples and the training process is stabilized by setting a target network. Furthermore, by integrating DNN, DQNSM methods can benefit the problem of high dimensional state space.

The key steps of the DQNSM method provided by the invention are as shown in algorithm 1:

based on the state space in equation (11), the action space in equation (12), and the definition of the winning function in equation (13), first initializeNetwork and network parameters thereof>Then, initialize target- >Network, and will->Network parameters of the network->Copy to target->A network. At the same time, an empirical playback pool R is initialized, a number E of training rounds, a maximum time slot T per round. Data samples generated by interaction of the DRL agent with the environment have Markov properties, but the samples need to obey independent co-distributed basic assumptions when training the neural network. To solve this problem, the DQNSM method breaks the correlation of training samples by setting up an empirical playback pool R, thereby alleviating the instability that occurs during training. Lines 5-9: in each training round, after accessing the MEC system, the mobile user u will create a corresponding service instance on the MEC server nearest to it, and in each time slot t thereafter, the DRSM method will send the system state s _t Input->Network and is->Network generated service migration decision a _t . Line 10: calculating instant prize r according to equation (13) _t And transitions to the next state s _t+1 . Lines 11-12: the current empirical sample (s _t ,a _t ,r _t ,s _t+1 ) And (3) storing R, and when the number of the samples stored in R reaches N, randomly taking out the N samples from the N samples for training network parameters. Line 13: calculating the next state action value by utilizing N samples sampled randomly, and comprehensively considering the current instant rewards r _t Fitting future rewards y _t . Line 14: define +.>The loss function of the network and minimizing the mean square error loss using Adam gradient descent algorithm, the purpose of which is to let +.>The output of the network approximates y _t I.e. updated->The network is better able to fit Q (s _t ,a _t ). Lines 15-16: when->When the number of updates of the network reaches lambda, it will +.>Parameters of the network are copied to the target->A network.

The method comprises the following steps:

the DQNSM method provided by the invention is based on Python 3.6 and an open source framework Pytorch to construct and train a neural network. All simulationsExperiments were performed on a desktop computer equipped with Intel i5-10505, with a CPU clock frequency of 3.2GHz and a memory of 16GB. In experiments, the MEC server is fixed in location, while the mobile user generates the computational tasks according to a certain random distribution. Based on the italian roman actual vehicle trajectory dataset, a region of the roman city center was considered as an experimental scenario, within which 4 MEC servers were deployed. Wherein, the computing power of each MEC server is 20GHz, and the total bandwidth is 10MHz. At time slot t ₀ At the start time, all users access the MEC server closest to them, for which the MEC server will create service instances. In the experiment, mobile user u continuously requests corresponding service instances at fixed time intervals t. One training round totals 60 time slots. After the model is trained, the method can be suitable for service migration optimization problems in different scenes.

Based on the above settings, a number of simulation experiments were performed to evaluate the performance of the proposed DQNSM method and to compare with the following 4 reference methods.

AM: at any time slot t, the service instance always follows the user to migrate to the MEC server closest to it.

NM: at any time slot T, the service instance is always located at the initial MEC server and no migration is performed during time period T.

PM: at any time slot t, the service instance migrates to the adjacent MEC server with a preset probability of 20%.

Greeny: only the best strategy for the current state, i.e. pre-calculating the rewards after migration and non-migration per time slot t, is focused on choosing an action with lower delay to execute.

As shown in fig. 3, the values of different discount factors γ are compared. The discount factor gamma is an important super-parameter in the DRL algorithm for balancing the importance of the current rewards and future rewards. The greater the discount factor, the more important the DRL agent will pay attention to future rewards; the smaller the discount factor, the more the DRL agent will tend to get an immediate prize. The present invention compares the case where γ is taken as 0.90, 0.95 and 0.99, respectively. Since the range of 0.90 to 0.99 is small, different values of γ in the range of 0.90 to 0.99 have little effect on the overall training result. From the simulation experiment results, the effect is best when gamma is taken to be 0.99, and the total delay after different times of iteration is the lowest. This is because the research problem of the present invention pursues a long-term optimization result, and the greater the discount factor, the more important the future rewards are, and when the discount factor is taken to 0.99, the more easily the DQNSM method is trained to obtain a strategy of greater long-term cumulative desired rewards.

As shown in fig. 4, the present invention evaluates the impact of network bandwidth on different approaches. With the increase of network bandwidth, the performance of all methods is in an increasing trend. This is because the user equipment connects to the MEC server nearest to it to send the task request, and the bandwidth of the MEC server is evenly allocated by users within its signal coverage area. Thus, the increase in network bandwidth will increase the available bandwidth for each user device, thereby reducing the wireless access delay for the user. The DQNSM method works better than other algorithms because it can dynamically observe the system state, adjust its network parameters to adapt to changes in bandwidth. The Greedy method is slightly inferior, and based on the Greedy of the mobile user, only the optimal migration strategy of each time slot can be obtained, and long-term global optimal solution can not be obtained. The AM, NM and PM methods migrate according to preset rules, and when a user frequently moves among MEC servers, the AM method can cause high service migration delay; the NM method may cause a high amount of communication delay if the user is far from the initial MEC server and is hardly moving any more; the PM method is a compromise method, which is suitable for a user to move to only the MEC server near the server where the last time slot is located, and the methods have good operation effect in a specific scene, but cannot adapt to a complex dynamic edge system.

As shown in fig. 5, the present invention evaluates the effect of MEC server calculation frequency on different methods. With the increase of the computing power of the MEC server, the performance of all methods has shown an increasing trend. This is because the user equipment sends a task request to the MEC server where its service instance is located at each time slot, and the MEC server allocates the corresponding computing resources. As the computational power of MEC servers increases, the computational resources allocated to each service instance increase, thus reducing the processing time of the task. The DQNSM method has good adaptability and generalization performance, learns the optimal strategy through interaction with MEC environment, is not limited by specific rules, and can adapt to different environmental conditions. The greeny method selects the optimal action in the current state at each time slot, irrespective of long-term benefits, which may result in it falling into a locally optimal solution. The other algorithms cannot dynamically change the migration decision according to the environment state, so that the change of the computing capacity of the MEC server cannot be perceived, as a rule-based method, the complex edge environment is difficult to process by the manual design rule, and the manual design rule also lacks generalization performance when facing the higher-dimension state space, and cannot adapt to the mobility of a user.

As shown in fig. 6, the present application evaluates the effect of unit hop migration coefficients on different methods. The NM method has no migration service instance in the whole process, so that the performance of the NM method is not changed along with the unit jump distance migration coefficient. With sigma ^m The performance of all methods gradually begins to decline. This is because σ ^m An increase in (c) will result in an increase in migration delay. Compared with other 4 reference methods, the DQNSM method has the best performance, because after the trained agent is deployed on the mobile user's equipment, the MEC environment state can be dynamically received, the network parameters thereof are adjusted, and the service migration action is adaptively output. The effect of the Greedy method is sigma ^m Smaller time performance is superior to NM method. When sigma is ^m When reaching 2.5, the performance is inferior to that of the NM method, because the Greedy method is only based on the Greedy of the mobile user, the optimal migration strategy of each time slot is obtained, the global optimal solution cannot be obtained, and the NM method is not affected by any influence. PM and AM methods perform the least effectively because they cannot dynamically change migration decisions based on environmental conditions.

As shown in fig. 7, the impact of different numbers of users on different methods was evaluated. As the number of users increases, the total prize of the system begins to decrease. This is because the user devices will share the bandwidth of the MEC server and their service instances will share the computational resources of the MEC server, resulting in a further increase in task response delay. The DQNSM method has model independence, even if the number of users changes, the model does not need to be redesigned, and the generalization of the deep neural network is utilized, so that the change of the MEC environment is adaptively learned and adapted through training and adjusting network parameters, and better performance is obtained in the service migration problem. The Greedy method lacks learning capability, directly outputs migration decisions according to the current MEC environment state, and cannot obtain long-term global optimal solutions. The rest methods can not dynamically change the migration decision according to the environment state, and the migration decision can not be changed no matter how the MEC environment is changed. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

The present invention is not limited to the above-mentioned best mode, any person can obtain other service migration methods based on DQN in various edge environments under the teaching of the present invention, and all equivalent changes and modifications according to the scope of the present invention shall be covered by the present invention.

Claims

1. The service migration method based on the DQN in the edge environment is characterized in that long-term QoS is taken as an optimization target, a state space, an action space and a reward function of a service migration problem in the MEC environment are defined based on a deep reinforcement learning framework, and the optimization problem is formalized and expressed as a Markov decision process; and solving the optimization problem by using a service migration method DQNSM based on DQN, and adopting an experience playback pool and a target network to achieve faster convergence speed and better convergence effect.

2. The DQN-based service migration method in an edge environment according to claim 1, wherein:

the MEC system is composed of M MEC servers, U mobile users and one MEC controller, wherein the M MEC servers are recorded as a set M= {1,2, & gt, M &., M }, and the U mobile users are recorded as a set U= {1,2, & gt, U, & gt, U }; the mobile user u sends a task processing request to a service instance of the mobile user u through a 5G network, the MEC server is responsible for receiving and forwarding the task request to the corresponding service instance, and the MEC controller is responsible for collecting and executing service migration decisions of all mobile users;

the MEC server to which the mobile subscriber u is connected is denoted as time slot tThe MEC server where the service instance resides is noted asHow to reduce long-term system delay and guarantee QoS by selecting proper service migration time nodes and destinations, and taking the delay as a measure index of QoS to define and form corresponding optimization problems.

3. The DQN-based service migration method in an edge environment according to claim 2, wherein:

in the migration model of the present invention,defined as the MEC server where the mobile user u service instance is located in time slot t-1, and a _t (u) ∈M is defined as the service migration decision of mobile user u at time slot t, i.e. at time slot t, the service instance corresponding to mobile user u will be according to a _t (u) migration to MEC Server +.>Use d _t (u) measure->And->A jump distance between the two; if it isI.e. d _t When (u) =0, no migration is required at this time; otherwise, its service instance needs to be moved from MEC serverMove to->And generating a corresponding migration delay; migration delay is a term of d _t A monotonically non-decreasing function of (u); the service migration delay of mobile user u at time slot t is defined as follows:

wherein S is _t (u) represents the data size of the service instance corresponding to mobile user u, η represents the network bandwidth, σ ^m Representation sheetA transition delay coefficient of the bit hop distance;

wherein P is _u Representing the transmit power of mobile user u's local device,representing mobile user u local deviceChannel gain, sigma between ² Representing the gaussian noise average power; and channel gain +.>Regarding the distance between the two, formalized definition is as follows:

wherein D is _t (u) represents the data size of the offload tasks;

HT _t (u)＝PT _t (u)+ST _t (u). (7)

C _t (u)＝D _t (u)K _t (u), (8)

wherein K is _t (u) represents the computation density of the task;

4. a DQN-based service migration method in an edge environment according to claim 3, characterized in that:

based on the system model definition, the goal of the proposed MEC system in the time horizon T is to minimize long-term system delays, including the sum of migration delays, communication delays, and computation delays for all mobile users; formalizing the optimization problem P1 is as follows:

state space: the state space contains the load situation V of all MEC servers in the time slot t region _t Location Loc of mobile user u _t (u) Task generated by intelligent application represented by a two-dimensional vector _t Task data volume D of (u) _t (u) and its calculated Density K _t (u) corresponding service instance data size S _t (u) MEC Server at time slot t-1Thus, the state space at time slot t is expressed as:

a _t ＝{1,2,...,m,...,M} (12)

r _t ＝-(MT _t (u)+HT _t (u)+CT _t (u)) (13)

The DQNSM method comprises the following steps: based on the state space in equation (11), the action space in equation (12), and the definition of the winning function in equation (13), first initializeNetwork and network parameters thereof>Then, initialize target->Network, and will->Network parameters of the network->Copy to target->A network; meanwhile, initializing an experience playback pool R, training the number E of rounds, and a maximum time slot T of each round; by setting an experience playback pool R, the correlation of training samples is broken, and instability in the training process is relieved; in each training round, mobile user u, after accessing the MEC system, creates a corresponding service instance on its nearest MEC server, and, at each time slot t thereafter, enters the system state s _t Input->Network and is->Network generated service migration decision a _t The method comprises the steps of carrying out a first treatment on the surface of the Calculating instant prize r according to equation (13) _t And transitions to the next state s _t+1 The method comprises the steps of carrying out a first treatment on the surface of the The current empirical sample (s _t ,a _t ,r _t ,s _t+1 ) Storing R, and randomly taking out N samples from the R when the number of the samples stored in the R reaches N for training network parameters; calculating the next state action value by utilizing N samples sampled randomly, and comprehensively considering the current instant rewards r _t Fitting future rewards y _t The method comprises the steps of carrying out a first treatment on the surface of the Definitions->The loss function of the network and minimizing the mean square error loss using Adam gradient descent algorithm, the purpose of which is to let +.>The output of the network approximates y _t I.e. updated->The network is better able to fit Q (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the When (when)When the number of updates of the network reaches lambda, it will +.>Parameters of the network are copied to the target->A network.