CN115484304A

CN115484304A - Real-time service migration method based on lightweight learning

Info

Publication number: CN115484304A
Application number: CN202210921760.0A
Authority: CN
Inventors: 陈晗頔; 王小洁; 宁兆龙; 亓伟敬; 宋清洋; 郭磊; 陈博宇
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-12-16
Anticipated expiration: 2042-08-02
Also published as: CN115484304B

Abstract

The invention discloses a real-time service migration method based on lightweight learning, which constructs a service collaborative migration frame facing a dynamic edge network and constructs a dual-objective optimization problem so as to simultaneously optimize service performance and cost. In order to solve the problem, an offline expert strategy based on a global state is provided to provide an optimal result as an expert track. In order to realize real-time service cooperative migration based on an observable state, the invention provides a lightweight online agent strategy based on imitation learning to imitate an expert track, and accelerates model migration by using element updating. The experimental performance results show that compared with other representative algorithms, the scheme provided by the invention can obviously improve the migration performance and reduce the training cost, and has obvious advantages on multiple indexes such as service delay, payment cost and the like under different working loads.

Description

Real-time service migration method based on lightweight learning

Technical Field

The invention relates to a cooperative migration method of real-time services in a dynamic edge network, in particular to a service migration algorithm based on imitation learning and a model migration acceleration algorithm based on meta learning.

Background

Enhanced mobile broadband has pushed 5G into commercial reality. With the transition to 6G, the rapid expansion of smart devices and the explosive growth of real-time applications, the most advanced service requirements such as holographic communication, digital twin and augmented reality are brought forward, a large amount of data which needs to be processed in time is generated, and the global mobile traffic will reach 1 ZB/month in 2028, which is equivalent to that 50 hundred million users spend 200GB each month in the world. The stringent computational power requirements are a significant challenge for resource-limited edge networks. The current equipment function is imperfect, which results in limited timeliness requirement and limited edge resource of entity service.

The high cost of updating or maintaining hardware limits the development of new service commercialization. To guarantee the performance of real-time services, resources (including computation, communication and caching) are reserved according to the requirements of the service session announcements. However, service execution requires heterogeneous resources between multiple edge devices, highly dependent on global network state. Since the information is isolated on a separate device, the edge devices cannot observe the global state due to limited communication capabilities. However, frequent interactions with a central node, such as a base station or other infrastructure with powerful sensors, burden the network and threaten private information. Therefore, a fundamental problem is how to design lightweight and distributed proxy strategies to enable autonomous service collaboration of devices to make optimal decisions in real time, especially for dynamic edge networks. The challenges facing the study of this problem are as follows:

1. resource contention is more intense in mobile devices with limited energy. A single service provider not only increases the rental burden, but also reduces the efficiency of resource utilization. Therefore, how to schedule services and manage heterogeneous resources in combination to optimize the quality of experience of service requesters is worthy of study.

2. Users are selfish and rational in the real world, and have different wishes to rent out resources. Therefore, there is a need to design an efficient pricing mechanism to incentivize devices and provide services to requesters by making a satisfactory tradeoff between stable but competitive infrastructure resources and decentralized but available device resources.

3. The training costs, communication load, and convergence speed created by the learning algorithm result in a dramatic drop in time-sensitive quality of service. Designing a lightweight learning strategy for distributed decision-making that supports online is quite challenging.

Disclosure of Invention

The invention aims to design an efficient heterogeneous resource integration scheme for providing optimized service performance and service cost of real-time service, and establishes a dynamic edge system supporting the cooperative migration of the real-time service. In order to minimize the delay and payment of service execution, the invention designs a lightweight continuous simulation service cooperative migration algorithm, provides an offline expert strategy based on matching and is used for providing an expert strategy for an agent, and designs a distributed agent strategy by minimizing the error of state-action pair distribution through simulation learning based on an obtained expert data set so as to fit the expert strategy. The method gets rid of the high learning load of the traditional algorithm and reduces the learning cost, and utilizes the element updating to accelerate the model training to realize the light-weight continuous simulation.

The main inventive content is summarized as follows:

1. the invention constructs an intelligent service cooperation migration framework based on resource combination optimization and provides a pricing mechanism capable of reflecting service cooperation willingness. The problem is formulated as a dual target optimization problem to minimize execution latency and payment, and the dual target problem is decomposed into selection of execution devices and determination of optimal mobility by analysis of optimal execution latency.

2. The invention provides an Online Service collaborative migration strategy (LOS) based on simulation Learning. And an off-line expert strategy is provided to obtain an optimal matching result to generate an expert trajectory data set for the agent.

3. The invention provides a lightweight online agent strategy, and online decision is made by simulating an obtained expert trajectory data set. To overcome the staleness of the expert data set, the present invention applies meta-learning to accelerate the migration of models to update agent strategies to reduce the effort of continuous training of the models.

In view of the above, the technical scheme adopted by the invention is as follows: a real-time service migration method based on lightweight learning comprises the following steps:

1) Constructing a dynamic edge network model; the method comprises the steps that areas are divided according to communication capacity of infrastructure, one area comprises a service provider and a service requester, service migration is set to be executed in discrete time slots, a user terminal can serve as the service requester and also can serve as the service provider, the service generated by the service requester can be partially migrated to other equipment to be executed, migration execution of the service is divided into three steps of inputting, executing and outputting, and the service requester divides the migrated part into two parts of local execution and migration execution to be executed in parallel, so that workload is dispersed, work efficiency is improved, and cost is reduced.

2) Resolving a service migration problem; and respectively taking the service delay and the migration payment cost as indexes of service cooperative migration performance and cost to construct a dual-objective optimization problem.

3) The infrastructure makes an optimal matching strategy based on the observed global states.

4) The expert data set is passed to the agent for the agent to train the agent policy based on the mock learning.

5) The intelligent agent trains an intelligent agent strategy based on an expert data set, and accelerates a model updating process based on a meta-learning strategy, so that the learning cost of a traditional neural network is eliminated, the traditional learning load is reduced, d time slots are set as an updating period, the expert track data set is updated in each updating period and provided for the distributed intelligent agent to learn, and each device needs to independently learn the strategy and independently update the strategy according to observable information so as to ensure the accuracy of the strategy.

The invention has the following advantages and beneficial effects:

1. the invention constructs an intelligent service cooperative migration framework based on resource combination optimization, and achieves the full utilization of resources by combining and optimizing heterogeneous resources. And then, a pricing mechanism capable of reflecting the cooperation intention of the service is provided, and the state of the service provider can be reflected through the price. The problem is formulated as a dual target optimization problem to minimize execution latency and payment to simultaneously optimize execution performance and cost, and the dual target problem is decomposed into select execution devices and determine optimal mobility by analyzing optimal execution latency.

2. The invention provides an online service cooperative migration strategy based on imitation learning. And an off-line expert strategy is provided to obtain an optimal matching result to generate an expert track data set for the intelligent agent, and the strategy can obtain an optimal migration result through a matching mode to be used for the intelligent agent to train a local model.

3. The invention provides a lightweight online agent strategy, which is used for online decision-making through an expert track data set obtained by simulation. To overcome the staleness of the expert data set, the present invention applies meta-learning to accelerate the migration of models to update agent strategies to reduce the effort of continuous training of the models. The strategy of the intelligent agent can be updated with a lower load by retaining part of prior knowledge and recording the migration process, so that the intelligent agent updates the training model with a very low working load, the updating process is accelerated, and the intelligent agent is more efficient in the actual execution process.

Drawings

FIG. 1 is a diagram of an illustrative system model for service migration in a dynamic network;

FIG. 2 is a service migration illustration;

FIG. 3 is a schematic diagram of the variation of the percentage of power consumed, the available CPU frequency and the rent;

FIG. 4 is a graph of the accuracy performance of the algorithm proposed by the present invention and other representative algorithms for different update rounds;

FIG. 5 is a graph of the performance of the execution times of the algorithm proposed by the present invention and other representative algorithms for different update rounds;

FIG. 6 is a graph of mobility profiles for the proposed algorithm and other representative algorithms under low workload;

FIG. 7 is a graph of mobility distribution for the proposed algorithm of the present invention and other representative algorithms at high workload;

FIG. 8 is a graph of achievable QoS distribution for the proposed algorithm and other representative algorithms under low workload;

FIG. 9 is a graph of achievable QoS distribution for the proposed algorithm and other representative algorithms under high workload;

FIG. 10 is a graphical illustration of the effect of service data size on average latency for the algorithm proposed by the present invention and other representative algorithms at low workload;

FIG. 11 is a graphical illustration of the effect of service data size on average latency for the algorithm proposed by the present invention and other representative algorithms at high workload;

FIG. 12 is a graphical illustration of the effect of service data size on average pay cost for the algorithm proposed by the present invention and other representative algorithms at low workload;

FIG. 13 is a graphical illustration of the effect of service data size on average pay cost for the proposed algorithm and other representative algorithms under high workload;

FIG. 14 is a graphical illustration of the impact of service data size on average energy consumption ratio for the algorithm proposed by the present invention and other representative algorithms at low workload;

FIG. 15 is a schematic diagram of the effect of service data size on the average energy consumption ratio of the algorithm of the present invention and other representative algorithms at high workload;

FIG. 16 is a graph illustrating the effect of service data size on the average time-to-live gain of the proposed algorithm and other representative algorithms at low workload;

FIG. 17 is a graph illustrating the effect of service data size on the average time-to-live gain of the proposed algorithm and other representative algorithms at high workload;

FIG. 18 is a graphical illustration of the effect of range on average delay for low workload communication algorithms of the present invention and other representative algorithms;

FIG. 19 is a graph illustrating the effect of communication range on average delay for the algorithm of the present invention and other representative algorithms under high workload;

FIG. 20 is a graphical illustration of the effect of range on average payment for the proposed algorithm and other representative algorithms of the present invention at low workload;

FIG. 21 is a graphical illustration of the effect of range on average pay rates for the algorithm proposed by the present invention and other representative algorithms at high workload;

FIG. 22 is a graphical illustration of the effect of range on average power consumption for the algorithm of the present invention and other representative algorithms at low workload;

FIG. 23 is a schematic diagram of the effect of communication distance on the average power consumption ratio of the algorithm of the present invention and other representative algorithms under high operating load;

FIG. 24 is a graphical illustration of the effect of range on average time-to-live gain for the algorithm proposed by the present invention and other representative algorithms at low workload;

FIG. 25 is a graphical illustration of the effect of range on average time-to-live gain of the proposed algorithm and other representative algorithms at high workload;

FIG. 26 is a graph showing the effect of the number of classes of service on the average latency of the proposed algorithm and other representative algorithms under low workload;

FIG. 27 is a graph illustrating the effect of the number of classes of service on the average latency of the proposed algorithm and other representative algorithms under high workload;

FIG. 28 is a graphical illustration of the impact of the number of service classes on the average payment for the proposed algorithm and other representative algorithms under low workload;

FIG. 29 is a graphical illustration of the impact of the number of classes of service at high workload on the average payment for the proposed algorithm and other representative algorithms of the present invention;

FIG. 30 is a graphical illustration of the impact of the number of classes of service on the average energy consumption ratio of the proposed algorithm and other representative algorithms at low workload;

FIG. 31 is a graphical illustration of the impact of the number of classes of service under high workload on the average energy consumption ratio of the proposed algorithm and other representative algorithms;

FIG. 32 is a graphical illustration of the impact of the number of classes of service on the average time-to-live gain of the proposed algorithm and other representative algorithms at low workload;

FIG. 33 is a graph illustrating the effect of number of service classes on average time-to-live gain for the algorithm proposed by the present invention and other representative algorithms at high workload.

Detailed Description

In order to show the advantages of the present invention more clearly and in detail, the following description will further describe the embodiments of the present invention with reference to the drawings.

The invention provides an efficient service cooperative migration framework, aims to design an efficient heterogeneous resource integration scheme to provide optimized service performance and service cost of real-time service, and provides a lightweight learning scheme based on imitation learning by analyzing the optimal migration rate of service cooperative migration.

Step 1):

fig. 1 is a diagram of an illustrative system model for service migration in a dynamic network, as shown, a dynamic edge network may divide areas according to communication capabilities of infrastructure, one area includes a service provider and a service requester, and in order to be able to capture dynamic conditions, it is configured that service migration is performed in discrete time slots, a user terminal (including a vehicle or an intelligent device) may also be used as a service provider while being used as a service requester, and a service generated by a service requester is configured to be partially migrated to another device for execution.

And the detailed migration execution process of the service can be divided into three steps of inputting, executing and outputting, as shown in the service migration description diagram of fig. 2, the input of the service comprises two parts, namely service data and data packets required by the service. The service requester decomposes the migrated part into a local execution part and a migrated execution part to execute in parallel, so that the workload is dispersed to improve the working efficiency and reduce the cost.

In time slot t, the random arrival number is n _t May be represented as

S _i (t) is D _i (t) generating service requests for services, for respectively different services,

indicating the class of service. K represents the total number of service classes.

Step 1.1):

the details of the service execution model are as follows:

the scenario studied comprises two modes of communication, i.e. device-to-device communication and device-to-infrastructure communication, the achievable communication rate between the two devices

Can be calculated from the shannon formula as follows:

wherein, B _ij Representing the communication bandwidth between device i and device j, Γ _ij (t) represents the signal to interference plus noise ratio between device i and device j at time slot t. Once the communication conditions between device i and device j satisfy the restrictions, a communication link can be established.

In order to guarantee the communication quality, the methodIt is obvious to consider that the user terminal can only communicate with one user equipment at the same time, i.e. the equipment does not interfere with the inter-equipment communication,

wherein

Presentation device D _i (t) a communication transmission power of (t),

presentation device D _i (t) and device D _j Channel gain between (t), and σ ² It represents additive white gaussian noise. Accordingly, if device D _i (t) and the infrastructure R (t) satisfy communicable conditions, a communication link can be constructed based on non-orthogonal multiple access, and the signal-to-interference-and-noise ratio (SIR) is gamma _ir (t) can be calculated from the following formula:

wherein

As a device D _i (t) a communication transmission power of (t),

for the gain of the communication channel, σ ² Is an additive white gaussian noise and is,

representing other devices and infrastructure communication power, channel gain, and device set, respectively. In time slot t, the service provider may receive more than one transmission request from other devices. The present invention sets each individual request to follow first come first serve, its arrival to follow a poisson distribution. Each user terminal has only one service table and can accommodate up to N requests. The service requests received by each device can be modeled as an M/G/1 queuing system. Transmission waiting timeDelay pipe

Can be calculated as:

where the variable lambda represents the transmission strength of the task and

representing the average transmission delay between the two devices. Theta ² Representing the variance of the propagation delay. Communication time delay

Can pass through

Is calculated, wherein

Representing the transmission delay of the task data.

When no service is found in the available devices, the service provider needs to download the data packets required for the service from the network and store them with sufficient remaining storage resources. The present invention recognizes that different packets buffered in a device follow a random distribution. The buffered packets may be shared with other communicable devices. Due to the scarcity of spectrum resources communicating with the infrastructure, the infrastructure can only download packets from the network.

After obtaining all of the input data, the service provider may provide computing resources to perform the service. Service execution up to processing rate

(in megabytes/second) can be calculated by:

wherein

Data size, α, for service Si (t) _ij (t) is a decision variable for task migration and, when i = j, indicates that the service is executing locally,

as a device D _j (t) available computing resources, R ^comp (t) available computing resources of the infrastructure,

for service S _i (t) required computational resources.

Based on the model, at device D _j (t) executing service S _i The delay of (t) includes four parts, namely service data acquisition delay, service required packet acquisition delay, execution delay and feedback delay. According to mobility gamma _i (t) dividing the service into a migration part and a local execution part. The present invention defines a binary decision variable alpha _ij (t) to indicate the selected service provider when

The service is migrated to the infrastructure for execution. And a binary decision variable beta _ijh (t) indicating the packet sharing device when

In time, the data package required for the service can be obtained by downloading. Thus, local execution latency

Can be calculated from the following equation:

i.e. local executionDelay pipe

Computing time delays for local

And local get packet delay

And (4) summing. The local calculation time delay calculation mode is as follows:

wherein gamma is _i (t) is a service S _i (t) a mobility of the electron beam,

for service S _i (t) the computational resources required for execution,

is a device D _i (t) computing power.

Local packet acquisition delay

The following were used:

wherein beta is _iih (t) decision variables obtained for the data packet,

in order to be the size of the data packet,

to obtain the communication rate of the data packets locally,

in order to wait for the delay in the transmission,

the packet download rate.

While migration is performed with latency

The calculation method is as follows:

wherein

Is the communication delay of the two devices,

in order to calculate the time delay for the device,

the acquisition delay of the data packet required for service, therefore, the migration execution delay can be calculated by the following formula:

wherein alpha is _ij (t) is a binary decision variable, γ, for selecting an execution device _i (t) is a service S _i (t) a mobility decision variable of (t),

for service S _i (t) the size of the data,

for service S _i (t) a size of the output data,

is the rate of communication between the two devices,

for the communication waiting rate, beta, between two devices _ijh (t) obtaining decision variables for the service data packet,

in order to be the size of the data packet required,

for the rate of download of the device,

the computational resources required for the data packet,

for the computing resources available to the device, R ^comp (t) infrastructure available computing resources, R ^down (t) infrastructure data download rate.

Since the local and migrated portions are performed concurrently, the total service execution delay T _i (t) can be obtained by the following formula:

i.e. taking the maximum latency of the local execution and the migration execution part, wherein

In order to perform the delay locally,

latency is performed for migration.

Step 1.2):

the rent model is detailed as follows:

due to the user's rationality and selfishness,a fair incentive mechanism is needed to facilitate device cooperation. In the present invention, the unit price of lease of a computing resource

Along with equipment D _j (t) state

A variation, defined as:

wherein the parameter k represents a price coefficient for adjusting the available computing power

And the remaining available energy of the plant

Impact on unit rent. These two factors are negatively correlated with the unit's rent to reflect the willingness trend for the profit of the leased resource. The pricing function may be divided into two parts (i.e.

And

) To reflect different sensitivities of computing power and remaining available energy to pricing, respectively. The present invention selects an exponential function to represent a higher sensitivity of the battery charge. If it is not

Extremely low, then D is whatever computing resource is available _j (t) the price is increased to avoid the fault caused by excessive power consumption.

Fig. 3 shows an example of the impact of two relevant state factors on pricing for k =0.5, with how much of a side of pricing reflects the propensity of a service provider to lease resources to provide service to a requester. The horizontal axis represents time slots and the vertical axis represents values of remaining energy, available computational resources, and leases. It is clear that when the remaining energy is quite low, the rent will rise dramatically to prevent the collapse of a power drain, no matter how much computing resources are available.

The infrastructure deployed in the real world has a fixed power supply, so the remaining capacity of the infrastructure can be considered to be sufficient at all times. Rent function

The calculation method is as follows:

wherein R is ^comp (t) is the available computing resources of the infrastructure, 1 the remaining available energy is always sufficient, and k is the price factor, thus the corresponding energy consumption

Is calculated as

Wherein gamma is _i (t) is the mobility of the service Si (t),

for local calculation of time delay, e ^comp In order to calculate the percentage of energy consumed per unit,

for local download delay, e ^down In order to download the percentage of energy consumed per unit,

for communication delay, e ^comm Is the percentage of energy consumed by the communication unit.

Step 2):

the detailed steps of the optimization target construction are as follows:

in order to reduce the influence of the time-varying heterogeneous resource state on the service cooperative migration performance, the service time delay and the migration payment cost are respectively used as the indexes of the service cooperative migration performance and the cost, and the dual-objective optimization problem P1 can be expressed as follows:

wherein

Indicating the length of the execution slot, alpha _ij (t) represents a service migration device decision variable, β _ijh (t) represents a service data packet acquisition decision variable, γ _i (T) is a service mobility decision variable, T _i (t) time slot execution for device, P _i (t) resource lease cost, S is total number of service requests to be executed. P1 is thus bound to

C6：γ _i (t)∈[0，1]，

Constraint C1 ensures that the execution latency of a service cannot exceed its tolerable latency to guarantee the quality of experience for the user, where T _i (t) is the service execution latency,

is K _i Tolerable delay for class services; constraint C2 ensures that the migration portion of each service needs to be completed within a communicable time, where

A delay is performed for the migration of the service,

the time delay is the communicable time delay between the two devices; constraint C3 ensures that each service provider should not exhaust its remaining energy to prevent service outages due to energy exhaustion, where

In order to provide the remaining energy for the equipment,

to perform energy consumption, D _i (t) and

respectively representing a device and a set of devices; c4 defines the upper limit of the communication capacity, alpha, of the device with the infrastructure _ij (t) is a device migration decision variable, R ^ch (t) is the upper limit of the number of channels; constraint C5 constrains the binary decision variable value, α _ij (t) and beta _ijh (t) decision variables, n, for the device migration and service packet acquisition modes, respectively _t To be provided withPreparing the total number; c6 illustrates the service mobility γ _i (t) value range, constraint C7 indicates when mobility γ is present _i (t) =0, when no service provider provides cooperation, that is, when no service provider provides cooperation

And step 3):

the optimization problem P1 constructed varies as follows:

since the purpose of the problem P1 is to minimize the average performance of the service cooperative migration, the present invention intends to minimize the average latency T of the service cooperative migration per slot _i (t) and cost P _i (t), P1 can be converted into:

constrained by C1-C7. Due to mobility, when locally executed, the delay

And migration execution latency

Equal, service execution delay T _i (t) is lowest, so P2 can be rewritten as:

constrained by C1-C7. Due to two decision variables alpha _ij (t) and beta _ijh (t) coupling each other, and in order to evaluate the pareto optimal solution, the invention defines a metric effect for expressing the optimal cost as

Thus, the joint optimization problem can be decomposed into two sub-problems P4 and P5 as follows:

constrained by C3-C5.

Constrained by C1, C2, C7.

And step 4):

the detailed steps of acquiring the expert track are as follows:

the system of the present invention involves multiple devices and multiple migrated services simultaneously. In time slot t, the service requester and the service provider can be constructed as two sets of entities without intersection, respectively denoted as

And

the benefit of migrating to each device can be derived from the observed global state. The problem presented can therefore translate into a matching problem that maximizes the overall benefit.

Step 4.1):

at the beginning of each time slot in an updating round, the matching times D of the equipment are initialized firstly _j (t), visit, and number of service matches S _i (t) visit is 0, wherein

Then the preference value of each device is initialized to 0, i.e.

And initializing the tuning parameters

Is infinity;

step 4.2):

for each service request, firstly, the optimal mobility executed on each migration device is obtained, and the matching decision alpha is obtained according to the obtained optimal mobility _ij (t) and beta _ijh (t), lower limit of mobility

Comprises the following steps:

wherein

Is K _i The tolerable delay of the service is then determined,

in order to obtain the packet delay locally,

for calculating the time delay locally when

Upper limit of mobility

Comprises the following steps:

wherein

In order to delay the communication between the two devices,

in order to wait for the communication to be delayed,

the time delay is obtained for the data packet,

in order to delay the time of communication,

to calculate the time delay. When in use

Upper limit of mobility

Comprises the following steps:

wherein

In order to be able to tolerate the delay of the service,

waiting for a delay for communication，

The time delay is obtained for the data packet,

in order to achieve a delay in the communication,

to calculate the time delay. Since the optimal time delay is the same as the local time delay and the migration time delay, the optimal mobility ratio

Can be expressed as:

wherein

In order to obtain the packet delay locally,

in order to calculate the time delay locally,

in order to migrate the execution time delay,

the time delay is obtained for the data packet,

in order to achieve a delay in the communication,

to calculate the time delay.

Represents the actual execution time delay of the task, canObserve if

γ _i (t) =0, month

The mobility was obtained as follows:

step 4.3):

for each attempted migration device, if the constraints C1-C7 are satisfied, the benefit U will be _ij (t) adding to service S in descending order _i (t) in the preference list. Otherwise, gamma will be _i Benefit U when (t) =0 _ij (t) adding to the preference list.

Obtaining a priority value for each service request based on all preference values

The maximum preference value for all services.

Step 4.4):

for service S in service request set _i (t) device set

And executing matching operation, wherein the specific execution process is as follows: from the collection

In is S _i (t) finding a suitable matching procedure for the performing device. Defining an expected value U _ij (t) is

And

and (4) the sum. If it satisfies

Then S _i (t) migration to device D _j (t) and returning a matching result. Otherwise, the tuning parameters Δ are matched _j (t) needs to be updated to

Wherein

For service S _i (t) a preference value for the value of (t),

is a device D _j Preference value of (t), U _ij (t) is a desired value.

Step 4.5):

if no matching result is returned in step 4.4), an update operation is performed to update the list of tuning variables for the device that has not been previously matched

The adjustment factor is updated to min { delta, delta _j (t) }, where δ is the adjustment factor initialized to ∞, Δ _j (t) adjusting all accessed service preference values to adjustment variables

Adjusting all vehicle preferences to

And all the adjustment variables Delta _j (t) update to Δ _j (t)-δ。

The invention sets the infrastructure as the expert node to obtain the complete global state and constructs the expert track<s(t)，a(t)>. The execution phases can be divided into

A batch comprising

A state-action pair for updating the impersonation policy. In order to realize real-time service migration in an edge network, the invention provides a lightweight distributed online agent emulation strategy. After completing the expert strategy, the tracks of the experts can be obtained as a data set

And transmitted to the agent training strategy as needed.

Step 5):

the online agent strategy comprises the following detailed steps:

in a dynamic edge network, devices are treated as distributed agents, and agent policies are trained to make migration decisions by mimicking expert trajectories and approximating the expert policies. However, an excessively large expert trail dataset can create a huge communication burden, and the expert trail dataset can become obsolete over time, so the agent needs to retrain the model to prevent performance loss, which is a repetitive process with a huge consumption of computing resources. Aiming at the problem, the invention provides a lightweight online agent strategy, which continuously imitates the updated expert track through some demonstrations.

The simulated learning process involves two participants: experts and agents. Setting d time slots as an updating period, wherein each updating period updates the expert trajectory data set and provides the updated expert trajectory data set to the distributed intelligent agent for learning. Update period is composed of

Denotes epsilon _l D sample trajectory data are included to construct an expert strategy. Each device needs to independently learn and independently update the strategy based on observable information to ensure the accuracy of the strategy. The updating steps of the intelligent agent strategy are as follows:

step 5.1):

the initial model needs to be pre-trained to provide a priori knowledge before the model is updated. After obtaining the initial expert demonstration data set epsilon ₀ And expert strategy

Thereafter, each agent needs to obtain an initial agent model by training the neural network. Proxy network estimating actions based on observed states

And fitting the observed states and the estimated motion profile according to a loss function

And expert strategy pi ^e (a, s) to train its strategy, loss function

The following were used:

wherein

Representing agent policy,. Pi ^e (a, s) denotes an expert policy, a denotes an actual action, s denotes an observed state,

a predicted motion is represented that is a function of,

representing a freezing parameter, theta ₀ Which is indicative of the initial parameters of the device,

indicating the desire. Therefore, the updating process of the parameters is as follows:

wherein l _b Fundamental theory of representationThe learning rate of the learning device is increased,

representing loss function

Of the gradient of (c).

Step 5.2):

in the refresh period

In (1),

representing a set of update periods. The agent obtains a partially updated expert trajectory epsilon _l . To speed up the process of repeated model migration, the present invention utilizes meta-learning to record the scaling and translation of model migration. The meta-learning parameter in the period l is denoted as ω _l . The meta learning process will

Is converted into

By passing

To obtain omega _l . The goal of meta-learning is to make

Is similar to

The meta-update of an agent comprises two sub-phases, namely basic learner training and meta-learner training. In the first period, randomly extracting the expert track epsilon from the data set _e，l Then sampling

Number of stripsTraining basic learning model according to the data, sampling

To train meta model learning, an

Temporary parameter θ' _l From the parameter theta of the l-1 period _l-1 The initialization is derived and used for fine tuning, updated as:

wherein l _b Based on the learning rate of the base learner,

to gradient the loss function of the basis learner,

to freeze the parameter, θ _l-1 Parameter of period l-1, ω _l-1 Are meta learner parameters. Thus the parameter omega of the meta learner _l The updating is as follows:

wherein l _m Based on the learning rate of the base learner,

to solve the gradient of the loss function of the meta-learner,

is a freezing parameter, θ' _l As a temporary parameter, ω _l-1 Is the meta-learner parameter for the l-1 period. Thus agent parameter θ _l Can be updated as:

wherein l _m Based on the learning rate of the base learner,

to solve the gradient of the loss function of the meta-learner,

is a freezing parameter, θ' _l As a temporary parameter, ω _l Is the meta-learner parameter for the l period.

Step 5.3):

after completing the first training of the agent, the distributed agent according to the strategy

And making a migration decision based on the observed state until the agent enters the coverage of other infrastructures or until the (l + 1) th updating period, repeating the 2 nd stage by the agent for updating, wherein the updating process of the agent model can continuously imitate the expert strategy in a lightweight way, and the expert data set is effectively adapted while some known prior knowledge is kept.

Through the steps, the cooperative transmission provided by the invention is realized. Figures 4 and 5 show the efficiency of the present invention. The expert trajectory needs to be updated at intervals to prevent performance loss due to data obsolescence. In order to ensure the timeliness of the expert track. As shown in fig. 4, the runtime of the proposed LOS agent policy is drastically reduced as the expert trajectory is updated. By combining the precision performance of fig. 5, it is shown that the LOS strategy without migration needs to retrain the agent strategy to make a decision by directly using the updated data set, which wastes prior knowledge, reduces accuracy based on a smaller data set, and obviously makes the agent strategy of LOS more suitable for continuous update required by a long-term scene, with an accuracy rate ranging around 0.74.

Fig. 6-9 illustrate the average mobility and achievable quality of service distribution for 10 update periods at low and high workloads, respectively. According to the proposed mobility acquisition scheme, the value of the optimal mobility depends on the selected service provider and the request state. The gap in achievable quality of service between high and low loads is quite small as shown in fig. 8 and 9. Based on this, the service requester of the proposed LOS policy (including the LOS agent policy and the LOS expert policy) achieves the highest quality of service except for full offloading of the service, proving the high efficiency of the proposed LOS scheme.

The policy performance at different service data sizes is shown in fig. 10-17. Fig. 10 and 11 are average latencies with increasing service data size at low and high workloads, respectively. Where the delay of the LOS agent policy is increased from 0.91 to 3.07 seconds at low workload and from 116.85 to 426.34 seconds at high workload, just above the LOS expert policy. The LOS agent policy can accommodate different workloads by balancing communication and computational load with an adjustable migration ratio, making a reasonable trade-off between local and migrated devices. Figures 14 and 15 show the average service processing energy consumption percentages at different workloads, respectively. It is clear that the average energy consumption percentage increases with increasing amount of service data, and in conjunction with the decreasing payment performance in fig. 12 and 13, the LOS expert policy gets an optimal decision to reduce costs while suppressing the rate of energy consumption increase, providing a better expert trajectory for LOS agent policy emulation. Fig. 16 and 17 evaluate the average lifetime increase of service requesters at low and high workloads, respectively. The rapid drop in time-to-live gain shown in fig. 17 indicates that the communication consumption exceeds the computational consumption saved at high workloads, and the LOS agent strategy can achieve near-optimal time-to-live gain by trading off different workloads.

Different communication distance limitations are shown in fig. 18-25. Fig. 18 and 19 show latency performance at different workloads, respectively, where the LOS agent policy has significant advantages over other policies. Fig. 22 and 23 illustrate energy consumption at low and high workloads, respectively. The power consumption of the LOS agent increases at low workload and decreases at high workload, indicating that the service requester is more inclined to migrate services by leasing resources at high load, at the expense of slight latency and power consumption, as can also be explained in fig. 18 and 20. As shown in fig. 22, 23, 24, 25, the LOS agent policy can more flexibly adapt to different communicable restrictions by generating a global state-action distribution through an LOS expert policy that approximates an optimal result, reducing local power consumption and thus extending the life cycle of local devices.

Service performance evaluation with policies of different numbers of service classes is shown in fig. 26-33. Experiments were performed from 3 to 9 classes of service to evaluate the generalization of the algorithm in case of multiple classes of service. Under the same experiment condition, the more the cache content types are, the lower the cache hit rate between the communication devices is. Fig. 26, 27 illustrate that LOS expert policy integration can take into account the communication, computation, and buffering states of the communicable devices under the same conditions, resulting in minimal latency. The LOS agent policy with timely update policy has good emulation performance based on limited observation states. Figures 28, 29 evaluate the pay-cost performance, demonstrating a satisfactory tradeoff between cache content and migration portion for the pay-cost of LOS agent policy, the adaptation of LOS agent policy to increasing classes of service, and the ability of agent policy to accurately model expert decision distribution and obtain near-optimal decisions. As shown in fig. 30, 31, the average energy consumption of the LOS agent policy is more stable than other algorithms under different number of service classes. Under different workloads, there is only a small gap between LOS agent policies from 3 to 9 service classes, thereby improving the lifetime gains assessed in fig. 32 and 33. This is not only because the LOS agent policy considers the status of both the service requester and provider, but also because the LOS agent policy is able to obtain a global state fit based on the partially observed status. The performance gain of the LOS agent policy rises with the increase in the number of service classes, indicating that LOS can effectively adapt to multiple service class scenarios.

The above technical solutions only represent the technical solutions of the present invention, and are not the most perfect and accurate solutions. As technology innovations and the era move, more reasonable and efficient changes may be made to the solution. The exemplary embodiments were chosen and described in order to explain the principles and the application of the invention and to facilitate reference by researchers and technicians, as well as to understand and practice the details of the invention. It is intended that all such modifications and variations be included within the scope of the invention, which is determined by the following claims and their equivalents, be included within the scope of the invention.

Claims

1. A real-time service migration method based on lightweight learning is characterized by comprising the following steps:

1) Constructing a dynamic edge network model; dividing regions according to the communication capacity of an infrastructure, wherein one region comprises a service provider and a service requester, setting that service migration is executed in discrete time slots, a user terminal can be used as the service provider while being used as the service requester, the service generated by the service requester can be partially migrated to other equipment for execution, the migration execution process of the service is divided into three steps of inputting, executing and outputting, and the service requester divides the migrated part into a local execution part and a migration execution part for parallel execution so as to disperse workload to improve the working efficiency and reduce the cost;

2) Resolving a service migration problem; respectively taking service delay and migration payment cost as indexes of service cooperative migration performance and cost to construct a dual-target optimization problem;

3) The infrastructure makes an optimal matching strategy based on the observed global state;

4) Transmitting the expert data set to the agent for the agent to train an agent strategy based on the imitation learning;

5) The intelligent agent trains the intelligent agent strategy based on the expert data set, and accelerates the model updating process based on the meta-learning strategy, so that the learning cost of the traditional neural network is eliminated, the traditional learning load is reduced, d time slots are set as an updating period, the expert track data set is updated in each updating period and provided for the distributed intelligent agent to learn, and each device needs to independently learn and independently update the strategy according to observable information to ensure the accuracy of the strategy.

2. The real-time service migration method based on lightweight learning according to claim 1, wherein: step 1) specifically comprises the steps of constructing service delay and transferring payment cost;

1.1 the service latency is as follows,

wherein

In order to perform the delay locally,

performing a time delay for the migration;

in order to calculate the time delay locally,

obtaining a packet delay for a local;

the migration execution time delay is as follows,

wherein

Is the communication delay of the two devices,

in order to calculate the time delay for the device,

obtaining time delay of data packets required for service;

1.2 the migration payment calculation process is as follows:

lease price for computing resources

Following state D _j (t) a change, defined as:

And surplus energy

Impact on unit rent;

rent function

The calculation method is as follows:

wherein R is ^comp (t) infrastructure available computing resources, 1 means that the remaining available energy is always sufficient, and κ is a price factor, so the corresponding energy consumption

Is calculated as

Wherein gamma is _i (t) is a service S _i (t) a mobility of the light-emitting element,

3. The real-time service migration method based on lightweight learning according to claim 1, wherein: step 2) the optimization problem P1 is

P1：

Wherein

Indicating the length of the execution slot, alpha _ij (t) represents a service migration device decision variable, β _ijh (t) represents a service data packet acquisition decision variable, γ _i (T) is a service mobility decision variable, T _i (t) time slot execution for device, P _i (t) resource lease cost, S total number of service requests to be performed, P1 constraint on

C6：γ _i (t)∈[0，1]，

A delay is performed for the migration of the service,

the time delay of communication between the two devices is obtained; constraint C3 ensures that each service provider should not exhaust its remaining energy toPreventing service interruption due to energy exhaustion, wherein

In order to provide the remaining energy for the equipment,

to perform energy consumption, D _i (t) and

respectively representing a device and a set of devices; c4 defines the upper limit of the communication capacity of the device with the infrastructure, alpha _ij (t) as a device migration decision variable, R ^ch (t) is the upper limit of the number of channels; constraint C5 constrains the binary decision variable value, α _ij (t) and beta _ijh (t) decision variables, n, for the device migration and service data packet acquisition modes, respectively _t The total number of the equipment is; c6 illustrates the service mobility γ _i (t) value range, constraint C7 indicates when mobility γ is present _i (t) =0, when no service provider provides cooperation, that is, when no service provider provides cooperation

4. The real-time service migration method based on lightweight learning according to claim 1 or 3, wherein: step 3) the optimal matching strategy decomposes the optimization problem P1 into two sub-problems P4 and P5 as follows:

P4：

constrained by C3-C5;

P5：

constrained by C1, C2, C7.

5. The real-time service migration method based on lightweight learning according to claim 4, wherein: the step 4) specifically comprises the following steps:

step 4.1):

Then, the preference value of each device is initialized to 0, i.e.

And initializing the tuning parameters

Is ∞;

step 4.2):

Comprises the following steps:

wherein

Is K _i The tolerable delay of the service is such that,

is a homeThe time delay of the data packet is obtained,

for calculating the time delay locally when

Upper limit of mobility

Comprises the following steps:

wherein

In order to delay the communication between the two devices,

in order to wait for the communication to be delayed,

the time delay is obtained for the data packet,

in order to delay the time of communication,

to calculate the time delay. When the temperature is higher than the set temperature

Upper limit of mobility

Comprises the following steps:

wherein

In order to be able to tolerate the delay of the service,

in order to wait for the communication to be delayed,

the time delay is obtained for the data packet,

in order to delay the time of communication,

to calculate the time delay. Since the optimal time delay is that the local time delay and the migration time delay are equal, the optimal migration rate

Can be expressed as:

wherein

In order to obtain the packet delay locally,

in order to calculate the time delay locally,

in order to migrate the execution latency,

the time delay is obtained for the data packet,

in order to delay the time of communication,

in order to calculate the time delay,

representing the actual execution delay of the task;

if it is

γ _i (t) =0, and

the mobility was obtained as follows:

step 4.3):

for each attempted migration device, if the constraints C1-C7 are satisfied, the benefit U will be _ij (t) adding to service S in descending order _i (t) preference list, otherwise γ _i Benefit U when (t) =0 _ij (t) adding to a preference list; obtaining a priority value for each service request based on all preference values

Is a stand forMaximum preference value for served;

step 4.4):

for service S in service request set _i (t) to the device set

And executing matching operation, wherein the specific execution process is as follows: from the set

Is in S _i (t) finding a suitable implementation for the matching process, defining an expected value U _ij (t) is

And

and if so, the sum

Then S _i (t) migration to device D _j (t) and returning the matching result, otherwise, matching the adjusting parameter delta _j (t) then needs to be updated to

Wherein

For service S _i (t) a preference value for (t),

as a device D _j Preference value of (t), U _ij (t) is a desired value.

Step 4.5):

Adjusting all vehicle preferences to

And all the adjustment variables Delta _j (t) update to Δ _j (t)-δ。

6. The method according to claim 1, wherein the method comprises the following steps: step 5) the updating steps of the intelligent agent strategy are as follows:

step 5.1):

obtaining an initial expert demonstration data set epsilon ₀ And expert strategy

Thereafter, each agent obtains an initial agent model by training a neural network, and the agent network estimates actions based on observed states

And expert strategy pi ^e (a, s) to train its strategy, loss function

The following:

wherein

Representing a smart agent strategy,. Pi ^e (as) denotes an expert policy, a denotes an actual action, s denotes an observed state,

a predicted action is represented by the predicted movement,

expressing the expectation; therefore, the updating process of the parameters is as follows:

wherein iota _b Represents the learning rate of the underlying learner,

representing loss function

A gradient of (a);

step 5.2):

in the update period

In (1),

representing a set of update periods, the agent obtains a partially updated expert trajectory epsilon _l By meta-learningRecord the scaling and transformation of model migration, with the meta-learning parameter in period l denoted as ω _l The Yuan learning process will

Is converted into

By passing

To obtain omega _l The goal of meta-learning is to make

Is similar to

Step 5.3):

after the first training of the agent, the distributed agent follows the strategy

Based on the observed state, a migration decision is made until he enters the coverage of other infrastructure or until the (l + 1) th update period, and the agent repeats step 5.2 for updating.

7. The method according to claim 6, wherein the method comprises: the element updating of the agent comprises two sub-stages, namely basic learner training and element learner training; in the first period, an expert trajectory epsilon is randomly extracted from the data set _e，l Then sampling

Training basic learning model by using bar data, sampling

To train meta model learning, an

wherein iota _b Based on the learning rate of the base learner,

to gradient the loss function of the base learner,

to freeze the parameter, θ _l-1 Parameter of period l-1, ω _l-1 Is a meta learner parameter; thus the parameter omega of the meta-learner _l The updating is as follows:

wherein iota _m Based on the learning rate of the base learner,

to solve the gradient of the loss function of the meta-learner,

is a freezing parameter, θ' _l As a temporary parameter, ω _l-1 A meta learner parameter for a period of l-1; thus agent parameter θ _l Can be updated as:

wherein iota _m Based on the learning rate of the base learner,

to solve the gradient of the loss function of the meta-learner,