CN117750436A

CN117750436A - Security service migration method and system in mobile edge computing scene

Info

Publication number: CN117750436A
Application number: CN202410166467.7A
Authority: CN
Inventors: 聂学方; 张鼎鼎; 王辰; 周天清
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-03-22
Anticipated expiration: 2044-02-06
Also published as: CN117750436B

Abstract

The application provides a security service migration method and a system in a mobile edge computing scene, wherein the method comprises the following steps: acquiring a moving track of a mobile user, and marking the track position of the moving track to form a data set; simulating a real target scene according to a data set to construct a system network model, adding security consideration, defining a delay sum target function of the model, and taking a negative value obtained by the function as a reward value; the modeling part can observe a Markov decision process model, and the model is combined with a near-end strategy optimization deep learning algorithm integrated with a long-term memory neural network to construct a potential space model, and the potential space model is optimally trained until a reward value is maximized, so that an optimal security service migration decision of the mobile user is finally obtained. The method and the device have the advantages that the system time delay is minimized as a total target, based on a part of observable Markov decision process model, a deep reinforcement learning algorithm is fused, and the safety and the accuracy of migration decisions are improved.

Description

Security service migration method and system in mobile edge computing scene

Technical Field

The present invention relates to the field of mobile edge computing, and in particular, to a method and system for secure service migration in a mobile edge computing scenario.

Background

During the movement of the user, the user needs to migrate the application service from one computing environment to another resource environment along with the user's location to reduce the latency to meet the service performance required by the user. Mobile edge computing is proposed as a novel network computing approach to meet the increasing demands of mobile users for quality of service. Mobile users need to transmit large-scale data to a cloud data center for real-time processing. This involves a number of challenges, including increasing task complexity, security protection of data, and strict latency requirements for users. The mobile edge computing (Mobile Edge Computing, MEC) provides a high bandwidth network service and ultra low latency to users by deploying computing and storage resources at the mobile network edge, providing an excellent service environment and powerful computing power for the mobile network. By moving the characteristics of the edge computing environment, services are migrated closer to the user, resulting in better performance and user experience. Therefore, mobile edge computing is a major direction of development for mobile users to migrate services.

Currently, markov decision process-based models are widely used to solve the mobile user service migration problem. By defining a reward function and corresponding transition probabilities between states and actions, a value iterative algorithm is utilized to solve the service migration problem based on a Markov decision process. Existing Markov-based decision process schemes typically rely on global system information to make decisions.

However, unlike the assumption that there is complete information in the ideal case, it is impractical to obtain models of complete user and server information in most related jobs. When a user makes a decision, the observed state information is often incomplete, reducing the correctness of the migration decision. Moreover, servers located at the edge of the network are vulnerable to close range attacks, there is a security vulnerability risk when offloading intensive services to MEC servers, and security services employed by edge nodes may also incur additional delay overhead. Therefore, the existing Markov decision process-based scheme has the problems of insufficient accuracy and security of migration decisions.

Disclosure of Invention

Based on the above, the application provides a security service migration method and a security service migration system in a mobile edge computing scene, which aim to solve the problems of insufficient accuracy and security of migration decisions existing in the conventional Markov decision process based scheme.

A first aspect of the embodiments provides a security service migration method in a mobile edge computing scenario, including:

acquiring a moving track of a mobile user, marking track positions at different moments on the moving track to form a track data set, and dividing the track data set into a training set and a verification set;

simulating a real target scene according to the track data set to construct a system network model, setting a delay sum objective function of a migration strategy according to the system network model, and taking a negative value obtained by the delay sum objective function as a reward value;

modeling a migration problem as a part of observable Markov decision process model, wherein the part of observable Markov decision process model is combined with an approximate strategy optimization deep learning algorithm integrated with a long-term and short-term memory neural network to construct a potential space model, and the training set carries out cyclic training on the potential space model until the reward value is maximized to obtain an optimized migration decision model;

and inputting the verification set into the optimized migration decision model to finally obtain the optimal security service migration decision of the mobile user.

As an optional implementation manner of the first aspect, the delay sum objective function is expressed as:

，

wherein t represents a time slot,migration decision representing time slot t +.>Representing the migration delay of a mobile subscriber->Representing the computation delay +.>Representing the total communication delay->Indicating the total security delay.

As an optional implementation manner of the first aspect, the migration delay is expressed as:

，

wherein,representing migration delay factor, ">Representing a hop distance between a current service node and a previous service node;

the computation delay is expressed as:

，

wherein,representing the number of CPU cycles required to complete the computing service, +.>Representing a calculated offload fraction in a service generated by a user,/->Representing the computing power of the service node, +.>Representing a ratio of computing resources allocated to the service;

the total communication delay is expressed as:

，

wherein,representing the sum of CPU cycles required to handle offload services, a.>For the size of the service data to be uploaded,representing channel capacity, +.>Indicating the average data transmission rate of the base station to the edge server,/->Representing the number of hops required from the base station to the edge server over the backhaul network;

the total security delay is expressed as:

，

wherein,，/>，

wherein,，/>representing decision variables->A security breach delay representing time slot t, +.>Representing the security protection delay of time slot t, +.>Representing a level of security for a service,/->Represents the expected security level, M represents the total number of tasks, L represents the total number of security levels,/-the number of security levels>Indicating the corresponding delay coefficients for different security protection classes.

As an optional implementation manner of the first aspect, the step of modeling the migration problem as a partially observable markov decision process model, wherein the partially observable markov decision process model is combined with an approximate strategy optimization deep learning algorithm integrated with the long-term memory neural network, so as to construct the potential space model includes:

extracting historical information through the long-short-term memory neural network, presuming the whole state from the historical information through an encoder model formed by an observation model and a hidden state transition model, and obtaining a rewarding value through a rewarding model formed by a deterministic state transition model and a hidden state transition model, wherein the method comprises the following steps of:

deterministic state transition model representation:，

the hidden state transition model is expressed as:,

the observation model is expressed as:，

the reward model is expressed as:，

the encoder model is expressed as:，

wherein P represents the probability of a state transition,representing an LSTM network function;

representing deterministic states of t time slots, the deterministic state transition model is expressed as t-1 time slot input deterministic statesAnd migration decision action->Obtain deterministic state->State transition probabilities of (2);

representing the hidden state of the t time slot, the hidden state transition model is expressed as t-1 time slot input hidden state +.>Deterministic state->And migration decision action->Get hidden state->State transition probabilities of (2);

the observation state of the t time slot is represented, and the observation model is expressed as t time slot input hidden state +.>And deterministic status->Obtaining the observation state->State transition probabilities of (2);

representing the rewarding status of the t time slot, the rewarding model is expressed as t time slot input hidden status +.>And deterministic status->Get rewarded status->State transition probabilities of (2);

representing the overall state, the encoder model is expressed as t-slot input hidden state +.>And observe the state->Get the overall state->State transition probabilities of (a).

The approximation strategy optimization deep learning algorithm includes an actor-critter algorithm to take the hidden stateAs input, use +.>And->Parameters representing actor networks and critics networks. The goal of the actor network is to approximate strategyFor distribution over the output action space of a given t time slot, critique network approximation +.>For from->Start and then follow policy->An estimate of the expected return in time,

the function of the approximation strategy is defined as:

，

the function of the approximation value is defined as:

，

wherein,action decision representing time slot t +.>And overall state->Is (are) transition probability policy->Is the attenuation coefficient>For immediate rewarding->The value varies with the state, +.>From start to end, E represents a value expected value.

Using dominance functionsParameterized dominance function as an evaluation function of the evaluation home network>Expressed as:

，

wherein,is an action cost function, +.>Is a state cost function;

introducing generalized dominance functionsTo control the balance between bias and variance, generalized dominance +.>Expressed as:

，

wherein r represents a prize,for adjusting the trade-off between variance and bias, T-q represents from zero to slot T end,/or->Is a time slot differential error, ">Is a state cost function;

，

wherein,representing rewards for time slot t->Is the state cost function of time slot t+1, < >>Is the attenuation coefficient>Is a state cost function of time slot t.

The loss function of the policy function of the rater network is defined as:

，

the loss function of the actor network is defined as:

，

wherein,representing a critic grid->Representing the actor's grid, respectively->And->Parameters representing actor networks and critics networks, < +.>Indicating that follow the track +.>Policy of->Expected value->The proportion is used for the importance of the importance,representing the ratio of the objective function to the current function, clip represents the target function to be (>) Clipping function with out-of-range values clipped, < ->Representing defined constants, ++>Representing generalized dominance function, ++>Entropy of the representation strategy->Is an entropy coefficient.

Gradient updating by minimizing loss functions of the actor network and the interview networkAnd->Is a parameter of (a).

A second aspect of an embodiment of the present application provides a security service migration system for a mobile user in a mobile edge computing scenario, including:

the system comprises an acquisition data and preprocessing module, a data acquisition and preprocessing module and a data processing module, wherein the acquisition data and preprocessing module is used for acquiring a moving track of a mobile user, marking track positions at different moments for the moving track to form a track data set, and dividing the track data set into a training set and a verification set;

the system network model building module is used for simulating a real target scene according to the track data set so as to build a system network model, setting a delay sum objective function of a migration strategy according to the system network model, and taking a negative value obtained by the delay sum objective function as a reward value;

the potential space model training module is used for modeling the migration problem into a part of observable Markov decision process model, the part of observable Markov decision process model is combined with a near-end strategy optimization deep learning algorithm integrated with a long-term memory neural network to construct a potential space model, and the training set carries out cyclic training on the potential space model until the rewarding value is maximized to obtain an optimized migration decision model;

and the migration decision module is used for inputting the verification set into the optimized migration decision model to finally obtain the optimal security service migration decision of the mobile user.

A third aspect of the embodiments of the present application provides a computer device, where the memory is configured to store a computer program, and the processor is configured to implement the above-mentioned method for secure service migration in a mobile edge computing scenario when executing the computer program stored on the memory.

Compared with the prior art, the security service migration method in the mobile edge computing scene is characterized in that a system network model is constructed by simulating a real scene, and a time delay sum objective function is defined on the model. The total safety delay is considered in the delay sum objective function, so that the safety of migration decisions is improved; the hidden environment information of the mobile user in the migration state can be observed by constructing a part of observable Markov decision process model, the history information is collected by combining a long-term and short-term memory neural network, and the neural network is introduced into an approximate strategy optimization algorithm to obtain a novel deep reinforcement learning algorithm. And combining the deep reinforcement learning algorithm with a part of observable Markov decision process model to construct a potential space model, carrying out optimization training on the model to maximize a reward value obtained by a defined delay sum objective function, and finally obtaining a migration decision which is an optimal security service migration strategy, so that the accuracy of the migration decision is further improved. Therefore, the method and the system provided by the application can solve the problems of insufficient accuracy and safety of migration decisions in the prior art.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

FIG. 1 is a flowchart of a method for secure service migration in a mobile edge computing scenario according to an embodiment of the present application;

FIG. 2 is a system network model diagram of a security service migration method in a mobile edge computing scenario according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a partially observable Markov decision process in a security service migration method in a mobile edge computing scenario according to one embodiment of the present application;

FIG. 4 is a diagram of a potential space model in a security service migration method in a mobile edge computing scenario according to an embodiment of the present application;

FIG. 5 is a flow chart of an experiment of a pair of examples of a security service migration method in a mobile edge computing scenario;

FIG. 6 is a graph showing a comparison example of the average total rewards of the track of san francisco for different algorithms in a comparison example of a security service migration method in a mobile edge computing scenario;

FIG. 7 is a graph showing a comparison example of the average total rewards of the roman locus for different algorithms in a comparison example of a security service migration method in a mobile edge computing scenario according to the present application;

fig. 8 is a schematic structural diagram of a security service migration system in a mobile edge computing scenario according to an embodiment of the present application.

The following detailed description will further illustrate the application in conjunction with the above-described figures.

Detailed Description

In order to facilitate an understanding of the present application, a more complete description of the present application will now be provided with reference to the relevant figures. Several embodiments of the present application are presented in the accompanying drawings. This application may, however, be embodied in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

In order to illustrate the technical solutions described in the present application, the following description is made by specific examples.

Referring to fig. 1, a flowchart of a security service migration method in a mobile edge computing scenario according to an embodiment of the present application is shown, and the details are as follows:

step S1, a moving track of a mobile user is obtained, track positions at different moments are marked on the moving track to form a track data set, and the track data set is divided into a training set and a verification set.

It should be noted that, key factors affecting the migration decision of the mobile user in a certain time slot mainly include information such as mobility of the user, operation load of the edge server, and computing resource allocation situation of the edge server, and the potential information and history information in the migration training process are observed and collected.

Optionally, the mobile user's movement track is acquired based on GPS data positioning.

Further, traversing the track data set to perform data cleaning, and performing feature extraction on the training set by adopting a feature extraction method, wherein the method comprises time features, space features, exploratory data analysis, feature scaling, feature selection and the like to improve the sample accuracy and richness of the training set.

Step S2: simulating a real target scene according to the track data set to construct a system network model, setting a delay sum objective function of a migration strategy according to the system network model, and taking a negative value obtained by the delay sum objective function as a reward value.

It should be noted that, the present invention takes the quality of service of the user as a measurement standard, the quality of service mainly considers the factors of migration, calculation, communication and security delay of the mobile user service, and the negative value obtained by adding these delays is taken as the reward, and the maximum reward is found as the optimal security policy.

Fig. 2 is a system network model diagram, and proposes a mobile edge scene composed of a mobile edge computing network and a macro base station in consideration of the optimization problem of service migration in the mobile edge scene. The mobile edge computing network is composed of x users, y edge computing servers and base station nodes, wherein each edge server is connected with one base station to form a server node. Users include various mobile devices such as smartphones, vehicles, and the like. The server node is made up of a plurality of edge servers, each of which serves tasks that handle mobile user offloading. The server nodes are connected with each other through optical fibers, and the macro base station is connected with the micro base station through wireless connection, so that services can be migrated between the edge servers. At the beginning of each slot, the mobile user makes a decision based on the information of the environment. Taking the user service quality as a measurement standard, the factors such as migration, calculation, communication, safety delay and the like are mainly considered. In order to maintain high quality of service, services need to dynamically migrate between mobile edge computing servers following mobile users.

Specifically, the delay sum objective function is expressed as:

，

The migration delay is expressed as:

，

wherein,representation ofMigration delay factor,/-)>Representing a hop distance between a current service node and a previous service node;

the calculation time delay is expressed as follows:

，

the total communication delay is expressed as:

，

wherein,representing the sum of CPU cycles required to handle offload services, a.>For the size of the service data to be uploaded,representing channel capacity, +.>Representing base station to edge server average dataTransmission rate,/-)>Representing the number of hops required from the base station to the edge server over the backhaul network;

the total safety delay is expressed as follows:

，

wherein,，/>，

wherein,，/>representing decision variables->Representing the security breach latency of time slot t,representing the security protection delay of time slot t, +.>Representing a level of security for a service,/->Represents the expected security level, M represents the total number of tasks, L represents the total number of security levels,/-the number of security levels>Indicating the corresponding delay coefficients for different security protection classes.

Step S3: modeling the migration problem as a part of observable Markov decision process model, combining the part of observable Markov decision process model with an approximate strategy optimization deep learning algorithm integrated with a long-short-term memory neural network to construct a potential space model, and carrying out cyclic training on the potential space model by the training set until the reward value is maximized to obtain an optimized migration decision model.

First, in constructing a potential space model, the following description will be made.

It should be noted that when a user makes a decision, the observed state information is often incomplete, so it is modeled as a partially observable markov decision process (Partially Observable Markov Decision Process, POMDP) model. Therefore, obtaining history information through Long Short-Term Memory (LSTM) and establishing an effective potential space model to infer an unobserved state of the environment is an effective scheme.

It should be noted that, in order to better infer the hidden state, a partially observable markov decision process diagram is shown in fig. 3, where the deterministic state and the hidden state are considered jointly. The LSTM network has excellent performance for information prediction, and is an effective scheme for making an optimal migration decision by sensing information through the LSTM and performing speculation according to historical information. Thus, a potential space model is built based on the POMDP model and LSTM network, as shown in fig. 4, which is a potential space model diagram.

By way of example, historical information is extracted through the long-short-term memory neural network, the overall state is deduced from the historical information through an encoder model formed by an observation model and a hidden state transition model, and a reward value is obtained through a reward model formed by a deterministic state transition model and a hidden state transition model, wherein:

deterministic state transition model representation:，

the hidden state transition model is expressed as:,

observationThe model is expressed as:，

the reward model is expressed as:，

the encoder model is expressed as:，

representing the hidden state of the t time slot, the hidden state transition model is expressed as t-1 time slot input hidden state +.>Deterministic state->And migration decision action->Is hiddenStatus->State transition probabilities of (2);

The application proposes a novel DMPPO (Dynamic Migration Proximal Policy Optimization) algorithm by combining a near-end policy optimization algorithm (Proximal Policy Optimization, PPO) of an actor-critique algorithm structure with a potential spatial model. The DMPPO algorithm uses an LSTM network to observe information, a potential space model is combined to infer the hidden state of the environment, and migration decision is made through the DMPPO algorithm based on the inferred and observed information.

the function of the approximation strategy is defined as:

，

the function of the approximation value is defined as:

，

Further, advantage functions are usedParameterized dominance function as an evaluation function of the evaluation home network>Expressed as:

，

wherein,is an action cost function, +.>Is a state cost function;

，

Further, the loss function of the policy function of the commentator network is defined as:

，

the loss function of the actor network is defined as:

，

wherein,representing a critic grid->Representing the actor's grid, respectively->And->Parameters representing actor networks and critics networks, < +.>Indicating that follow the track +.>Policy of->Expected value->The proportion is used for the importance of the importance,representing the ratio of the objective function to the current function, clip represents the target function to be (>) Clipping function with out-of-range values clipped, < ->Representing defined constants, ++>Representing generalized dominance function, ++>Entropy of the representation strategy->Is an entropy coefficient, and model robustness is increased by introducing entropy concepts.

Next, the training of the neural network will be described below.

It should be noted that, according to the history information collected in the LSTM, multiple cyclic training is performed, and a small batch of gradient update is performed. Gradient update neural networks by minimizing loss functions of actors and critics networksAnd->Is a parameter of (a). Based on the standard PPO algorithm structure, the actor and the commentator network are parameterized by a neural network, and the hidden state collected by LSTM>As input. Will follow the policy->Track sampled in the environment ∈>Represented asBy minimizing the slot differential error based on the sampling trajectory +.>To update the critic network, the actor network goal is to find the optimal strategy to maximize the jackpot. Finally through updating target network, strategy network and criticismAnd (5) converging the parameter values of the network to finally obtain the strategy. Thus, the potential space model training stage is completed.

Each training cycle process of the migration algorithm consists of a sampling process and a target strategy updating process. In the sampling process, first, parameters of the actor network and the target network are synchronized, and then trajectories are sampled from the system network environment using the actor network and the strategy network. The target strategy updating process trains the target strategy through the sampling track. Cycling training was performed using Adam's small batch random gradient descent to update parameters of the target network, actor network, and commentator network.

Optionally, the training optimization method of the model adopts Adam function, and the discount coefficient is set to 0.99. The LSTM hidden layer is set to 256 layers, the super parameter epsilon is set to 0.1, the training batch size is set to 40, the training period is 120, the initial learning rate is set to 0.0005, and the entropy is set to 0.01.

Step S4: and inputting the verification set into the optimized migration decision model to finally obtain the optimal security service migration decision of the mobile user.

It should be noted that, since the optimized migration decision model includes a trained neural network, the network maps from an environmental state to an action by learning a mapping relationship. According to the strategy of environment training, the intelligent agent interacts with the environment. At each time step, the action output by the model is obtained by inputting the current state into the policy network, and then this action is performed, observing the feedback of the environment, including the new state and rewards.

The performance of the algorithm was tested using a trained neural network environment. By calculating the jackpot, i.e., the time delay, as a performance indicator. Rewards are used to quantify the performance of the algorithm in the test tasks and to aid in tuning and improvement. And according to the performance of the test stage, selecting and adjusting parameters of the strategy network to further improve the performance.

In summary, according to the security service migration method in the mobile edge computing scene, a system network model is firstly constructed through simulating a real scene, a time delay sum objective function is defined on the model, and the total security time delay is considered in the time delay sum objective function, so that the security of migration decisions is improved; the hidden environment information of the mobile user in the migration state can be observed by constructing a part of observable Markov decision process model, the history information is collected by combining a long-term and short-term memory neural network, and the neural network is introduced into an approximate strategy optimization algorithm to obtain a novel deep reinforcement learning algorithm. And combining the deep reinforcement learning algorithm with a part of observable Markov decision process model to construct a potential space model, carrying out optimization training on the model to maximize a reward value obtained by a defined delay sum objective function, and finally obtaining a migration decision which is an optimal security service migration strategy, so that the accuracy of the migration decision is further improved.

Referring to fig. 5, an experimental flowchart of a comparative example of a security service migration method in a mobile edge computing scenario provided in the present application is shown, and specific steps are as follows:

step S01: taxi track data sets of san francisco and roman at different moments are collected.

Step S02: the PPO algorithm introduced into LSTM network is defined as DMPPO algorithm.

Step S03: the DMPPO algorithm is compared with AM algorithm, NM algorithm and DRQN algorithm.

It should be noted that, the DMPPO migration algorithm proposed in the present application is improved from the near-end policy optimization algorithm PPO algorithm, so in experimental comparison, a conventional baseline migration method is used to perform comparison verification with a classical deep reinforcement learning AM algorithm, an NM algorithm and a DRQN algorithm.

Wherein, the Always Migate (AM) algorithm: the mobile user always selects the nearest edge computing server for migration at each time slot.

Never Migate (NM) algorithm: the service is placed at the edge computing server and no migration is performed for a time frame.

Deep Current Q-Learning (DRQN) algorithm: and taking the objective function as a training target, carrying out greedy exploration by using a classical DQN algorithm idea, and approximating a DRQN algorithm of the action cost function by using a cyclic neural network structure similar to the DMPPO algorithm.

Step S04: and obtaining the stability of the average total rewards of various algorithms through 5000 updating iteration times.

Fig. 6 shows a graph of a comparison of the average total prize of the track of san francisco by the different algorithms, and fig. 7 shows a graph of the average total prize of the track of roman by the different algorithms. The DMPPO algorithm was tested and compared with different algorithms and baseline strategies in the two trajectories, respectively. It should be noted that the baseline strategy does not involve the training process of the neural network, and thus directly demonstrates their final performance. In the two tracks, the rewarding values of the AM baseline algorithm and the NM baseline algorithm are stabilized at about-1500 and about-1600, the rewarding values of the DRQN algorithm after 3500 iterations are stabilized at about-1300, and the rewarding values of the DMPPO algorithm after 1500 iterations are stabilized at about-800. The results of the graph show that DMPPO is more stable than DRQM, and the average total prize is the largest. The DMPPO algorithm is superior to the three comparison algorithms on the two movement tracks of san Francisco and Roman.

Referring to fig. 8, a schematic structural diagram of a security service migration system in a mobile edge computing scenario according to an embodiment of the present application is shown, where the system includes:

the data acquisition and preprocessing module 10 is used for acquiring a moving track of a mobile user, marking track positions at different moments for the moving track to form a track data set, and dividing the track data set into a training set and a verification set;

the system network model building module 20 is configured to simulate a real target scene according to the trajectory data set, so as to build a system network model, set a delay sum objective function of a migration strategy according to the system network model, and take a negative value obtained by the delay sum objective function as a reward value;

a potential space model training module 30, configured to model the migration problem as a partially observable markov decision process model, where the partially observable markov decision process model is combined with an approximate strategy optimization deep learning algorithm integrated with a long-term memory neural network to construct a potential space model, and perform cyclic training on the potential space model by the training set until the reward value is maximized, so as to obtain an optimized migration decision model;

and the migration decision module 40 is configured to input the verification set into the optimized migration decision model, and finally obtain an optimal security service migration decision of the mobile user.

In another aspect, the present application further proposes a computer device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to implement the above-mentioned method for secure service migration in a mobile edge computing scenario when executing the computer program stored on the memory.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method for secure service migration in a mobile edge computing scenario, the method comprising:

2. The secure service migration method of claim 1, wherein the delay-sum objective function is expressed as:

，

3. The security service migration method according to claim 2, wherein the migration delay is expressed as:

，

the computation delay is expressed as:

，

the total communication delay is expressed as:

，

wherein,representing the sum of CPU cycles required to handle offload services, a.>For the size of the service data uploaded, +.>Representing channel capacity, +.>Indicating the average data transmission rate of the base station to the edge server,/->Representing the number of hops required from the base station to the edge server over the backhaul network;

the total security delay is expressed as:

，

wherein,，/>，

wherein,，/>representing decision variables->Representing security holes for time slot tDelay (or) add>Representing the security protection delay of time slot t, +.>Representing a level of security for a service,/->Represents the expected security level, M represents the total number of tasks, L represents the total number of security levels,/-the number of security levels>Indicating the corresponding delay coefficients for different security protection classes.

4. The security service migration method of claim 1, wherein the step of modeling the migration problem as a partially observable markov decision process model combined with an approximate policy optimized deep learning algorithm incorporating a long and short term memory neural network to construct the potential space model comprises:

deterministic state transition model representation:，

the hidden state transition model is expressed as:,

the observation model is expressed as:，

the reward model is expressed as:，

the encoder model is expressed as:，

representing the deterministic state of the t-slot, the deterministic state transition model is expressed as t-1 slot input deterministic state +.>And migration decision action->Obtain deterministic state->State transition probabilities of (2);

5. The security service migration method of claim 4, wherein the step of modeling the migration problem as a partially observable markov decision process model combined with an approximate policy optimized deep learning algorithm incorporating a long and short term memory neural network to construct the potential space model comprises:

the approximation strategy optimization deep learning algorithm includes an actor-critter algorithm to take the hidden stateAs input, use +.>And->Parameters representing an actor network and a critic network, the goal of which is an approximation strategy +.>For distribution over the output action space of a given t time slot, critique network approximation +.>For from->Start and then follow policy->An estimate of the expected return in time,

the function of the approximation strategy is defined as:

，

the function of the approximation value is defined as:

，

wherein,action decision representing time slot t +.>And overall state->Is (are) transition probability policy->Is the coefficient of attenuation which is the coefficient of attenuation,for immediate rewarding->The value varies with the state, +.>From start to end, E represents a value expected value.

6. The security service migration method of claim 5, wherein the step of modeling the migration problem as a partially observable markov decision process model combined with an approximate policy optimized deep learning algorithm incorporating a long term memory neural network to construct the potential space model comprises:

，

wherein,is an action cost function, +.>Is a state cost function;

，

wherein r represents a prize,for adjusting the trade-off between variance and bias, T-q represents the time slot from zero start to slot T end,is the time slot differential error and,/>is a state cost function;

，

7. The security service migration method of claim 6, wherein the step of modeling the migration problem as a partially observable markov decision process model combined with an approximate policy optimized deep learning algorithm incorporating a long term memory neural network to construct the potential space model comprises:

the loss function of the policy function of the rater network is defined as:

，

the loss function of the actor network is defined as:

，

8. The security service migration method of claim 7, wherein the step of modeling the migration problem as a partially observable markov decision process model combined with an approximate policy optimized deep learning algorithm incorporating a long term memory neural network to construct the potential space model comprises:

9. A security service migration system in a mobile edge computing scenario, the system comprising:

the potential space model training module is used for modeling the migration problem into a part of observable Markov decision process model, the part of observable Markov decision process model is combined with an approximate strategy optimization deep learning algorithm integrated with a long-term memory neural network to construct a potential space model, and the training set carries out cyclic training on the potential space model until the rewarding value is maximized to obtain an optimized migration decision model;

10. A computer device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to implement a method for secure service migration in a mobile edge computing scenario according to any one of claims 1-8 when executing a computer program stored on the memory.