CN115484304A - Real-time service migration method based on lightweight learning - Google Patents

Real-time service migration method based on lightweight learning Download PDF

Info

Publication number
CN115484304A
CN115484304A CN202210921760.0A CN202210921760A CN115484304A CN 115484304 A CN115484304 A CN 115484304A CN 202210921760 A CN202210921760 A CN 202210921760A CN 115484304 A CN115484304 A CN 115484304A
Authority
CN
China
Prior art keywords
service
migration
delay
learning
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210921760.0A
Other languages
Chinese (zh)
Other versions
CN115484304B (en
Inventor
陈晗頔
王小洁
宁兆龙
亓伟敬
宋清洋
郭磊
陈博宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210921760.0A priority Critical patent/CN115484304B/en
Publication of CN115484304A publication Critical patent/CN115484304A/en
Application granted granted Critical
Publication of CN115484304B publication Critical patent/CN115484304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a real-time service migration method based on lightweight learning, which constructs a service collaborative migration frame facing a dynamic edge network and constructs a dual-objective optimization problem so as to simultaneously optimize service performance and cost. In order to solve the problem, an offline expert strategy based on a global state is provided to provide an optimal result as an expert track. In order to realize real-time service cooperative migration based on an observable state, the invention provides a lightweight online agent strategy based on imitation learning to imitate an expert track, and accelerates model migration by using element updating. The experimental performance results show that compared with other representative algorithms, the scheme provided by the invention can obviously improve the migration performance and reduce the training cost, and has obvious advantages on multiple indexes such as service delay, payment cost and the like under different working loads.

Description

Real-time service migration method based on lightweight learning
Technical Field
The invention relates to a cooperative migration method of real-time services in a dynamic edge network, in particular to a service migration algorithm based on imitation learning and a model migration acceleration algorithm based on meta learning.
Background
Enhanced mobile broadband has pushed 5G into commercial reality. With the transition to 6G, the rapid expansion of smart devices and the explosive growth of real-time applications, the most advanced service requirements such as holographic communication, digital twin and augmented reality are brought forward, a large amount of data which needs to be processed in time is generated, and the global mobile traffic will reach 1 ZB/month in 2028, which is equivalent to that 50 hundred million users spend 200GB each month in the world. The stringent computational power requirements are a significant challenge for resource-limited edge networks. The current equipment function is imperfect, which results in limited timeliness requirement and limited edge resource of entity service.
The high cost of updating or maintaining hardware limits the development of new service commercialization. To guarantee the performance of real-time services, resources (including computation, communication and caching) are reserved according to the requirements of the service session announcements. However, service execution requires heterogeneous resources between multiple edge devices, highly dependent on global network state. Since the information is isolated on a separate device, the edge devices cannot observe the global state due to limited communication capabilities. However, frequent interactions with a central node, such as a base station or other infrastructure with powerful sensors, burden the network and threaten private information. Therefore, a fundamental problem is how to design lightweight and distributed proxy strategies to enable autonomous service collaboration of devices to make optimal decisions in real time, especially for dynamic edge networks. The challenges facing the study of this problem are as follows:
1. resource contention is more intense in mobile devices with limited energy. A single service provider not only increases the rental burden, but also reduces the efficiency of resource utilization. Therefore, how to schedule services and manage heterogeneous resources in combination to optimize the quality of experience of service requesters is worthy of study.
2. Users are selfish and rational in the real world, and have different wishes to rent out resources. Therefore, there is a need to design an efficient pricing mechanism to incentivize devices and provide services to requesters by making a satisfactory tradeoff between stable but competitive infrastructure resources and decentralized but available device resources.
3. The training costs, communication load, and convergence speed created by the learning algorithm result in a dramatic drop in time-sensitive quality of service. Designing a lightweight learning strategy for distributed decision-making that supports online is quite challenging.
Disclosure of Invention
The invention aims to design an efficient heterogeneous resource integration scheme for providing optimized service performance and service cost of real-time service, and establishes a dynamic edge system supporting the cooperative migration of the real-time service. In order to minimize the delay and payment of service execution, the invention designs a lightweight continuous simulation service cooperative migration algorithm, provides an offline expert strategy based on matching and is used for providing an expert strategy for an agent, and designs a distributed agent strategy by minimizing the error of state-action pair distribution through simulation learning based on an obtained expert data set so as to fit the expert strategy. The method gets rid of the high learning load of the traditional algorithm and reduces the learning cost, and utilizes the element updating to accelerate the model training to realize the light-weight continuous simulation.
The main inventive content is summarized as follows:
1. the invention constructs an intelligent service cooperation migration framework based on resource combination optimization and provides a pricing mechanism capable of reflecting service cooperation willingness. The problem is formulated as a dual target optimization problem to minimize execution latency and payment, and the dual target problem is decomposed into selection of execution devices and determination of optimal mobility by analysis of optimal execution latency.
2. The invention provides an Online Service collaborative migration strategy (LOS) based on simulation Learning. And an off-line expert strategy is provided to obtain an optimal matching result to generate an expert trajectory data set for the agent.
3. The invention provides a lightweight online agent strategy, and online decision is made by simulating an obtained expert trajectory data set. To overcome the staleness of the expert data set, the present invention applies meta-learning to accelerate the migration of models to update agent strategies to reduce the effort of continuous training of the models.
In view of the above, the technical scheme adopted by the invention is as follows: a real-time service migration method based on lightweight learning comprises the following steps:
1) Constructing a dynamic edge network model; the method comprises the steps that areas are divided according to communication capacity of infrastructure, one area comprises a service provider and a service requester, service migration is set to be executed in discrete time slots, a user terminal can serve as the service requester and also can serve as the service provider, the service generated by the service requester can be partially migrated to other equipment to be executed, migration execution of the service is divided into three steps of inputting, executing and outputting, and the service requester divides the migrated part into two parts of local execution and migration execution to be executed in parallel, so that workload is dispersed, work efficiency is improved, and cost is reduced.
2) Resolving a service migration problem; and respectively taking the service delay and the migration payment cost as indexes of service cooperative migration performance and cost to construct a dual-objective optimization problem.
3) The infrastructure makes an optimal matching strategy based on the observed global states.
4) The expert data set is passed to the agent for the agent to train the agent policy based on the mock learning.
5) The intelligent agent trains an intelligent agent strategy based on an expert data set, and accelerates a model updating process based on a meta-learning strategy, so that the learning cost of a traditional neural network is eliminated, the traditional learning load is reduced, d time slots are set as an updating period, the expert track data set is updated in each updating period and provided for the distributed intelligent agent to learn, and each device needs to independently learn the strategy and independently update the strategy according to observable information so as to ensure the accuracy of the strategy.
The invention has the following advantages and beneficial effects:
1. the invention constructs an intelligent service cooperative migration framework based on resource combination optimization, and achieves the full utilization of resources by combining and optimizing heterogeneous resources. And then, a pricing mechanism capable of reflecting the cooperation intention of the service is provided, and the state of the service provider can be reflected through the price. The problem is formulated as a dual target optimization problem to minimize execution latency and payment to simultaneously optimize execution performance and cost, and the dual target problem is decomposed into select execution devices and determine optimal mobility by analyzing optimal execution latency.
2. The invention provides an online service cooperative migration strategy based on imitation learning. And an off-line expert strategy is provided to obtain an optimal matching result to generate an expert track data set for the intelligent agent, and the strategy can obtain an optimal migration result through a matching mode to be used for the intelligent agent to train a local model.
3. The invention provides a lightweight online agent strategy, which is used for online decision-making through an expert track data set obtained by simulation. To overcome the staleness of the expert data set, the present invention applies meta-learning to accelerate the migration of models to update agent strategies to reduce the effort of continuous training of the models. The strategy of the intelligent agent can be updated with a lower load by retaining part of prior knowledge and recording the migration process, so that the intelligent agent updates the training model with a very low working load, the updating process is accelerated, and the intelligent agent is more efficient in the actual execution process.
Drawings
FIG. 1 is a diagram of an illustrative system model for service migration in a dynamic network;
FIG. 2 is a service migration illustration;
FIG. 3 is a schematic diagram of the variation of the percentage of power consumed, the available CPU frequency and the rent;
FIG. 4 is a graph of the accuracy performance of the algorithm proposed by the present invention and other representative algorithms for different update rounds;
FIG. 5 is a graph of the performance of the execution times of the algorithm proposed by the present invention and other representative algorithms for different update rounds;
FIG. 6 is a graph of mobility profiles for the proposed algorithm and other representative algorithms under low workload;
FIG. 7 is a graph of mobility distribution for the proposed algorithm of the present invention and other representative algorithms at high workload;
FIG. 8 is a graph of achievable QoS distribution for the proposed algorithm and other representative algorithms under low workload;
FIG. 9 is a graph of achievable QoS distribution for the proposed algorithm and other representative algorithms under high workload;
FIG. 10 is a graphical illustration of the effect of service data size on average latency for the algorithm proposed by the present invention and other representative algorithms at low workload;
FIG. 11 is a graphical illustration of the effect of service data size on average latency for the algorithm proposed by the present invention and other representative algorithms at high workload;
FIG. 12 is a graphical illustration of the effect of service data size on average pay cost for the algorithm proposed by the present invention and other representative algorithms at low workload;
FIG. 13 is a graphical illustration of the effect of service data size on average pay cost for the proposed algorithm and other representative algorithms under high workload;
FIG. 14 is a graphical illustration of the impact of service data size on average energy consumption ratio for the algorithm proposed by the present invention and other representative algorithms at low workload;
FIG. 15 is a schematic diagram of the effect of service data size on the average energy consumption ratio of the algorithm of the present invention and other representative algorithms at high workload;
FIG. 16 is a graph illustrating the effect of service data size on the average time-to-live gain of the proposed algorithm and other representative algorithms at low workload;
FIG. 17 is a graph illustrating the effect of service data size on the average time-to-live gain of the proposed algorithm and other representative algorithms at high workload;
FIG. 18 is a graphical illustration of the effect of range on average delay for low workload communication algorithms of the present invention and other representative algorithms;
FIG. 19 is a graph illustrating the effect of communication range on average delay for the algorithm of the present invention and other representative algorithms under high workload;
FIG. 20 is a graphical illustration of the effect of range on average payment for the proposed algorithm and other representative algorithms of the present invention at low workload;
FIG. 21 is a graphical illustration of the effect of range on average pay rates for the algorithm proposed by the present invention and other representative algorithms at high workload;
FIG. 22 is a graphical illustration of the effect of range on average power consumption for the algorithm of the present invention and other representative algorithms at low workload;
FIG. 23 is a schematic diagram of the effect of communication distance on the average power consumption ratio of the algorithm of the present invention and other representative algorithms under high operating load;
FIG. 24 is a graphical illustration of the effect of range on average time-to-live gain for the algorithm proposed by the present invention and other representative algorithms at low workload;
FIG. 25 is a graphical illustration of the effect of range on average time-to-live gain of the proposed algorithm and other representative algorithms at high workload;
FIG. 26 is a graph showing the effect of the number of classes of service on the average latency of the proposed algorithm and other representative algorithms under low workload;
FIG. 27 is a graph illustrating the effect of the number of classes of service on the average latency of the proposed algorithm and other representative algorithms under high workload;
FIG. 28 is a graphical illustration of the impact of the number of service classes on the average payment for the proposed algorithm and other representative algorithms under low workload;
FIG. 29 is a graphical illustration of the impact of the number of classes of service at high workload on the average payment for the proposed algorithm and other representative algorithms of the present invention;
FIG. 30 is a graphical illustration of the impact of the number of classes of service on the average energy consumption ratio of the proposed algorithm and other representative algorithms at low workload;
FIG. 31 is a graphical illustration of the impact of the number of classes of service under high workload on the average energy consumption ratio of the proposed algorithm and other representative algorithms;
FIG. 32 is a graphical illustration of the impact of the number of classes of service on the average time-to-live gain of the proposed algorithm and other representative algorithms at low workload;
FIG. 33 is a graph illustrating the effect of number of service classes on average time-to-live gain for the algorithm proposed by the present invention and other representative algorithms at high workload.
Detailed Description
In order to show the advantages of the present invention more clearly and in detail, the following description will further describe the embodiments of the present invention with reference to the drawings.
The invention provides an efficient service cooperative migration framework, aims to design an efficient heterogeneous resource integration scheme to provide optimized service performance and service cost of real-time service, and provides a lightweight learning scheme based on imitation learning by analyzing the optimal migration rate of service cooperative migration.
Step 1):
fig. 1 is a diagram of an illustrative system model for service migration in a dynamic network, as shown, a dynamic edge network may divide areas according to communication capabilities of infrastructure, one area includes a service provider and a service requester, and in order to be able to capture dynamic conditions, it is configured that service migration is performed in discrete time slots, a user terminal (including a vehicle or an intelligent device) may also be used as a service provider while being used as a service requester, and a service generated by a service requester is configured to be partially migrated to another device for execution.
And the detailed migration execution process of the service can be divided into three steps of inputting, executing and outputting, as shown in the service migration description diagram of fig. 2, the input of the service comprises two parts, namely service data and data packets required by the service. The service requester decomposes the migrated part into a local execution part and a migrated execution part to execute in parallel, so that the workload is dispersed to improve the working efficiency and reduce the cost.
In time slot t, the random arrival number is n t May be represented as
Figure BDA0003777850300000051
Figure BDA0003777850300000052
S i (t) is D i (t) generating service requests for services, for respectively different services,
Figure BDA0003777850300000053
indicating the class of service. K represents the total number of service classes.
Step 1.1):
the details of the service execution model are as follows:
the scenario studied comprises two modes of communication, i.e. device-to-device communication and device-to-infrastructure communication, the achievable communication rate between the two devices
Figure BDA0003777850300000061
Can be calculated from the shannon formula as follows:
Figure BDA0003777850300000062
wherein, B ij Representing the communication bandwidth between device i and device j, Γ ij (t) represents the signal to interference plus noise ratio between device i and device j at time slot t. Once the communication conditions between device i and device j satisfy the restrictions, a communication link can be established.
In order to guarantee the communication quality, the methodIt is obvious to consider that the user terminal can only communicate with one user equipment at the same time, i.e. the equipment does not interfere with the inter-equipment communication,
Figure BDA0003777850300000063
wherein
Figure BDA0003777850300000064
Presentation device D i (t) a communication transmission power of (t),
Figure BDA0003777850300000065
presentation device D i (t) and device D j Channel gain between (t), and σ 2 It represents additive white gaussian noise. Accordingly, if device D i (t) and the infrastructure R (t) satisfy communicable conditions, a communication link can be constructed based on non-orthogonal multiple access, and the signal-to-interference-and-noise ratio (SIR) is gamma ir (t) can be calculated from the following formula:
Figure BDA0003777850300000066
wherein
Figure BDA0003777850300000067
As a device D i (t) a communication transmission power of (t),
Figure BDA0003777850300000068
for the gain of the communication channel, σ 2 Is an additive white gaussian noise and is,
Figure BDA0003777850300000069
representing other devices and infrastructure communication power, channel gain, and device set, respectively. In time slot t, the service provider may receive more than one transmission request from other devices. The present invention sets each individual request to follow first come first serve, its arrival to follow a poisson distribution. Each user terminal has only one service table and can accommodate up to N requests. The service requests received by each device can be modeled as an M/G/1 queuing system. Transmission waiting timeDelay pipe
Figure BDA00037778503000000610
Can be calculated as:
Figure BDA00037778503000000611
where the variable lambda represents the transmission strength of the task and
Figure BDA00037778503000000612
representing the average transmission delay between the two devices. Theta 2 Representing the variance of the propagation delay. Communication time delay
Figure BDA00037778503000000613
Can pass through
Figure BDA00037778503000000614
Is calculated, wherein
Figure BDA00037778503000000615
Representing the transmission delay of the task data.
When no service is found in the available devices, the service provider needs to download the data packets required for the service from the network and store them with sufficient remaining storage resources. The present invention recognizes that different packets buffered in a device follow a random distribution. The buffered packets may be shared with other communicable devices. Due to the scarcity of spectrum resources communicating with the infrastructure, the infrastructure can only download packets from the network.
After obtaining all of the input data, the service provider may provide computing resources to perform the service. Service execution up to processing rate
Figure BDA00037778503000000616
(in megabytes/second) can be calculated by:
Figure BDA00037778503000000617
wherein
Figure BDA00037778503000000618
Data size, α, for service Si (t) ij (t) is a decision variable for task migration and, when i = j, indicates that the service is executing locally,
Figure BDA00037778503000000619
as a device D j (t) available computing resources, R comp (t) available computing resources of the infrastructure,
Figure BDA0003777850300000071
for service S i (t) required computational resources.
Based on the model, at device D j (t) executing service S i The delay of (t) includes four parts, namely service data acquisition delay, service required packet acquisition delay, execution delay and feedback delay. According to mobility gamma i (t) dividing the service into a migration part and a local execution part. The present invention defines a binary decision variable alpha ij (t) to indicate the selected service provider when
Figure BDA0003777850300000072
The service is migrated to the infrastructure for execution. And a binary decision variable beta ijh (t) indicating the packet sharing device when
Figure BDA0003777850300000073
In time, the data package required for the service can be obtained by downloading. Thus, local execution latency
Figure BDA0003777850300000074
Can be calculated from the following equation:
Figure BDA0003777850300000075
i.e. local executionDelay pipe
Figure BDA0003777850300000076
Computing time delays for local
Figure BDA0003777850300000077
And local get packet delay
Figure BDA0003777850300000078
And (4) summing. The local calculation time delay calculation mode is as follows:
Figure BDA0003777850300000079
wherein gamma is i (t) is a service S i (t) a mobility of the electron beam,
Figure BDA00037778503000000710
for service S i (t) the computational resources required for execution,
Figure BDA00037778503000000711
is a device D i (t) computing power.
Local packet acquisition delay
Figure BDA00037778503000000712
The following were used:
Figure BDA00037778503000000713
wherein beta is iih (t) decision variables obtained for the data packet,
Figure BDA00037778503000000714
in order to be the size of the data packet,
Figure BDA00037778503000000715
to obtain the communication rate of the data packets locally,
Figure BDA00037778503000000716
in order to wait for the delay in the transmission,
Figure BDA00037778503000000717
the packet download rate.
While migration is performed with latency
Figure BDA00037778503000000718
The calculation method is as follows:
Figure BDA00037778503000000719
wherein
Figure BDA00037778503000000720
Is the communication delay of the two devices,
Figure BDA00037778503000000721
in order to calculate the time delay for the device,
Figure BDA00037778503000000722
the acquisition delay of the data packet required for service, therefore, the migration execution delay can be calculated by the following formula:
Figure BDA00037778503000000723
wherein alpha is ij (t) is a binary decision variable, γ, for selecting an execution device i (t) is a service S i (t) a mobility decision variable of (t),
Figure BDA00037778503000000724
for service S i (t) the size of the data,
Figure BDA00037778503000000725
for service S i (t) a size of the output data,
Figure BDA00037778503000000726
is the rate of communication between the two devices,
Figure BDA00037778503000000727
for the communication waiting rate, beta, between two devices ijh (t) obtaining decision variables for the service data packet,
Figure BDA00037778503000000728
in order to be the size of the data packet required,
Figure BDA00037778503000000729
for the rate of download of the device,
Figure BDA0003777850300000081
the computational resources required for the data packet,
Figure BDA0003777850300000082
for the computing resources available to the device, R comp (t) infrastructure available computing resources, R down (t) infrastructure data download rate.
Since the local and migrated portions are performed concurrently, the total service execution delay T i (t) can be obtained by the following formula:
Figure BDA0003777850300000083
i.e. taking the maximum latency of the local execution and the migration execution part, wherein
Figure BDA0003777850300000084
In order to perform the delay locally,
Figure BDA0003777850300000085
latency is performed for migration.
Step 1.2):
the rent model is detailed as follows:
due to the user's rationality and selfishness,a fair incentive mechanism is needed to facilitate device cooperation. In the present invention, the unit price of lease of a computing resource
Figure BDA0003777850300000086
Along with equipment D j (t) state
Figure BDA00037778503000000821
A variation, defined as:
Figure BDA0003777850300000088
wherein the parameter k represents a price coefficient for adjusting the available computing power
Figure BDA0003777850300000089
And the remaining available energy of the plant
Figure BDA00037778503000000810
Impact on unit rent. These two factors are negatively correlated with the unit's rent to reflect the willingness trend for the profit of the leased resource. The pricing function may be divided into two parts (i.e.
Figure BDA00037778503000000811
And
Figure BDA00037778503000000812
) To reflect different sensitivities of computing power and remaining available energy to pricing, respectively. The present invention selects an exponential function to represent a higher sensitivity of the battery charge. If it is not
Figure BDA00037778503000000813
Extremely low, then D is whatever computing resource is available j (t) the price is increased to avoid the fault caused by excessive power consumption.
Fig. 3 shows an example of the impact of two relevant state factors on pricing for k =0.5, with how much of a side of pricing reflects the propensity of a service provider to lease resources to provide service to a requester. The horizontal axis represents time slots and the vertical axis represents values of remaining energy, available computational resources, and leases. It is clear that when the remaining energy is quite low, the rent will rise dramatically to prevent the collapse of a power drain, no matter how much computing resources are available.
The infrastructure deployed in the real world has a fixed power supply, so the remaining capacity of the infrastructure can be considered to be sufficient at all times. Rent function
Figure BDA00037778503000000814
The calculation method is as follows:
Figure BDA00037778503000000815
wherein R is comp (t) is the available computing resources of the infrastructure, 1 the remaining available energy is always sufficient, and k is the price factor, thus the corresponding energy consumption
Figure BDA00037778503000000816
Is calculated as
Figure BDA00037778503000000817
Wherein gamma is i (t) is the mobility of the service Si (t),
Figure BDA00037778503000000818
for local calculation of time delay, e comp In order to calculate the percentage of energy consumed per unit,
Figure BDA00037778503000000819
for local download delay, e down In order to download the percentage of energy consumed per unit,
Figure BDA00037778503000000820
for communication delay, e comm Is the percentage of energy consumed by the communication unit.
Step 2):
the detailed steps of the optimization target construction are as follows:
in order to reduce the influence of the time-varying heterogeneous resource state on the service cooperative migration performance, the service time delay and the migration payment cost are respectively used as the indexes of the service cooperative migration performance and the cost, and the dual-objective optimization problem P1 can be expressed as follows:
Figure BDA0003777850300000091
Figure BDA0003777850300000092
wherein
Figure BDA0003777850300000093
Indicating the length of the execution slot, alpha ij (t) represents a service migration device decision variable, β ijh (t) represents a service data packet acquisition decision variable, γ i (T) is a service mobility decision variable, T i (t) time slot execution for device, P i (t) resource lease cost, S is total number of service requests to be executed. P1 is thus bound to
Figure BDA0003777850300000094
Figure BDA0003777850300000095
Figure BDA0003777850300000096
Figure BDA0003777850300000097
Figure BDA0003777850300000098
C6:γ i (t)∈[0,1],
Figure BDA0003777850300000099
Constraint C1 ensures that the execution latency of a service cannot exceed its tolerable latency to guarantee the quality of experience for the user, where T i (t) is the service execution latency,
Figure BDA00037778503000000910
is K i Tolerable delay for class services; constraint C2 ensures that the migration portion of each service needs to be completed within a communicable time, where
Figure BDA00037778503000000911
A delay is performed for the migration of the service,
Figure BDA00037778503000000912
the time delay is the communicable time delay between the two devices; constraint C3 ensures that each service provider should not exhaust its remaining energy to prevent service outages due to energy exhaustion, where
Figure BDA00037778503000000913
In order to provide the remaining energy for the equipment,
Figure BDA00037778503000000914
to perform energy consumption, D i (t) and
Figure BDA00037778503000000915
respectively representing a device and a set of devices; c4 defines the upper limit of the communication capacity, alpha, of the device with the infrastructure ij (t) is a device migration decision variable, R ch (t) is the upper limit of the number of channels; constraint C5 constrains the binary decision variable value, α ij (t) and beta ijh (t) decision variables, n, for the device migration and service packet acquisition modes, respectively t To be provided withPreparing the total number; c6 illustrates the service mobility γ i (t) value range, constraint C7 indicates when mobility γ is present i (t) =0, when no service provider provides cooperation, that is, when no service provider provides cooperation
Figure BDA00037778503000000916
Figure BDA00037778503000000917
And step 3):
the optimization problem P1 constructed varies as follows:
since the purpose of the problem P1 is to minimize the average performance of the service cooperative migration, the present invention intends to minimize the average latency T of the service cooperative migration per slot i (t) and cost P i (t), P1 can be converted into:
Figure BDA0003777850300000101
Figure BDA0003777850300000102
constrained by C1-C7. Due to mobility, when locally executed, the delay
Figure BDA0003777850300000103
And migration execution latency
Figure BDA0003777850300000104
Equal, service execution delay T i (t) is lowest, so P2 can be rewritten as:
Figure BDA0003777850300000105
Figure BDA0003777850300000106
Figure BDA0003777850300000107
constrained by C1-C7. Due to two decision variables alpha ij (t) and beta ijh (t) coupling each other, and in order to evaluate the pareto optimal solution, the invention defines a metric effect for expressing the optimal cost as
Figure BDA0003777850300000108
Thus, the joint optimization problem can be decomposed into two sub-problems P4 and P5 as follows:
Figure BDA0003777850300000109
constrained by C3-C5.
Figure BDA00037778503000001010
Constrained by C1, C2, C7.
And step 4):
the detailed steps of acquiring the expert track are as follows:
the system of the present invention involves multiple devices and multiple migrated services simultaneously. In time slot t, the service requester and the service provider can be constructed as two sets of entities without intersection, respectively denoted as
Figure BDA00037778503000001011
Figure BDA00037778503000001012
And
Figure BDA00037778503000001013
the benefit of migrating to each device can be derived from the observed global state. The problem presented can therefore translate into a matching problem that maximizes the overall benefit.
Step 4.1):
at the beginning of each time slot in an updating round, the matching times D of the equipment are initialized firstly j (t), visit, and number of service matches S i (t) visit is 0, wherein
Figure BDA0003777850300000111
Then the preference value of each device is initialized to 0, i.e.
Figure BDA0003777850300000112
And initializing the tuning parameters
Figure BDA0003777850300000113
Is infinity;
step 4.2):
for each service request, firstly, the optimal mobility executed on each migration device is obtained, and the matching decision alpha is obtained according to the obtained optimal mobility ij (t) and beta ijh (t), lower limit of mobility
Figure BDA0003777850300000114
Comprises the following steps:
Figure BDA0003777850300000115
wherein
Figure BDA0003777850300000116
Is K i The tolerable delay of the service is then determined,
Figure BDA0003777850300000117
in order to obtain the packet delay locally,
Figure BDA0003777850300000118
for calculating the time delay locally when
Figure BDA0003777850300000119
Upper limit of mobility
Figure BDA00037778503000001110
Comprises the following steps:
Figure BDA00037778503000001111
wherein
Figure BDA00037778503000001112
In order to delay the communication between the two devices,
Figure BDA00037778503000001113
in order to wait for the communication to be delayed,
Figure BDA00037778503000001114
the time delay is obtained for the data packet,
Figure BDA00037778503000001115
in order to delay the time of communication,
Figure BDA00037778503000001116
to calculate the time delay. When in use
Figure BDA00037778503000001117
Figure BDA00037778503000001118
Upper limit of mobility
Figure BDA00037778503000001119
Comprises the following steps:
Figure BDA00037778503000001120
wherein
Figure BDA00037778503000001121
In order to be able to tolerate the delay of the service,
Figure BDA00037778503000001122
waiting for a delay for communication,
Figure BDA00037778503000001123
The time delay is obtained for the data packet,
Figure BDA00037778503000001124
in order to achieve a delay in the communication,
Figure BDA00037778503000001125
to calculate the time delay. Since the optimal time delay is the same as the local time delay and the migration time delay, the optimal mobility ratio
Figure BDA00037778503000001126
Can be expressed as:
Figure BDA00037778503000001127
wherein
Figure BDA00037778503000001128
In order to obtain the packet delay locally,
Figure BDA00037778503000001129
in order to calculate the time delay locally,
Figure BDA00037778503000001130
in order to migrate the execution time delay,
Figure BDA00037778503000001131
the time delay is obtained for the data packet,
Figure BDA00037778503000001132
in order to achieve a delay in the communication,
Figure BDA00037778503000001133
to calculate the time delay.
Figure BDA00037778503000001134
Represents the actual execution time delay of the task, canObserve if
Figure BDA00037778503000001135
γ i (t) =0, month
Figure BDA00037778503000001136
The mobility was obtained as follows:
Figure BDA00037778503000001137
step 4.3):
for each attempted migration device, if the constraints C1-C7 are satisfied, the benefit U will be ij (t) adding to service S in descending order i (t) in the preference list. Otherwise, gamma will be i Benefit U when (t) =0 ij (t) adding to the preference list.
Obtaining a priority value for each service request based on all preference values
Figure BDA0003777850300000121
The maximum preference value for all services.
Step 4.4):
for service S in service request set i (t) device set
Figure BDA0003777850300000122
And executing matching operation, wherein the specific execution process is as follows: from the collection
Figure BDA0003777850300000123
In is S i (t) finding a suitable matching procedure for the performing device. Defining an expected value U ij (t) is
Figure BDA0003777850300000124
And
Figure BDA0003777850300000125
and (4) the sum. If it satisfies
Figure BDA0003777850300000126
Then S i (t) migration to device D j (t) and returning a matching result. Otherwise, the tuning parameters Δ are matched j (t) needs to be updated to
Figure BDA0003777850300000127
Wherein
Figure BDA0003777850300000128
For service S i (t) a preference value for the value of (t),
Figure BDA0003777850300000129
is a device D j Preference value of (t), U ij (t) is a desired value.
Step 4.5):
if no matching result is returned in step 4.4), an update operation is performed to update the list of tuning variables for the device that has not been previously matched
Figure BDA00037778503000001210
The adjustment factor is updated to min { delta, delta j (t) }, where δ is the adjustment factor initialized to ∞, Δ j (t) adjusting all accessed service preference values to adjustment variables
Figure BDA00037778503000001211
Adjusting all vehicle preferences to
Figure BDA00037778503000001212
And all the adjustment variables Delta j (t) update to Δ j (t)-δ。
The invention sets the infrastructure as the expert node to obtain the complete global state and constructs the expert track<s(t),a(t)>. The execution phases can be divided into
Figure BDA00037778503000001213
A batch comprising
Figure BDA00037778503000001214
A state-action pair for updating the impersonation policy. In order to realize real-time service migration in an edge network, the invention provides a lightweight distributed online agent emulation strategy. After completing the expert strategy, the tracks of the experts can be obtained as a data set
Figure BDA00037778503000001215
And transmitted to the agent training strategy as needed.
Step 5):
the online agent strategy comprises the following detailed steps:
in a dynamic edge network, devices are treated as distributed agents, and agent policies are trained to make migration decisions by mimicking expert trajectories and approximating the expert policies. However, an excessively large expert trail dataset can create a huge communication burden, and the expert trail dataset can become obsolete over time, so the agent needs to retrain the model to prevent performance loss, which is a repetitive process with a huge consumption of computing resources. Aiming at the problem, the invention provides a lightweight online agent strategy, which continuously imitates the updated expert track through some demonstrations.
The simulated learning process involves two participants: experts and agents. Setting d time slots as an updating period, wherein each updating period updates the expert trajectory data set and provides the updated expert trajectory data set to the distributed intelligent agent for learning. Update period is composed of
Figure BDA00037778503000001216
Denotes epsilon l D sample trajectory data are included to construct an expert strategy. Each device needs to independently learn and independently update the strategy based on observable information to ensure the accuracy of the strategy. The updating steps of the intelligent agent strategy are as follows:
step 5.1):
the initial model needs to be pre-trained to provide a priori knowledge before the model is updated. After obtaining the initial expert demonstration data set epsilon 0 And expert strategy
Figure BDA0003777850300000131
Thereafter, each agent needs to obtain an initial agent model by training the neural network. Proxy network estimating actions based on observed states
Figure BDA0003777850300000132
And fitting the observed states and the estimated motion profile according to a loss function
Figure BDA0003777850300000133
And expert strategy pi e (a, s) to train its strategy, loss function
Figure BDA0003777850300000134
The following were used:
Figure BDA0003777850300000135
wherein
Figure BDA0003777850300000136
Representing agent policy,. Pi e (a, s) denotes an expert policy, a denotes an actual action, s denotes an observed state,
Figure BDA0003777850300000137
a predicted motion is represented that is a function of,
Figure BDA0003777850300000138
representing a freezing parameter, theta 0 Which is indicative of the initial parameters of the device,
Figure BDA0003777850300000139
indicating the desire. Therefore, the updating process of the parameters is as follows:
Figure BDA00037778503000001310
wherein l b Fundamental theory of representationThe learning rate of the learning device is increased,
Figure BDA00037778503000001311
representing loss function
Figure BDA00037778503000001332
Of the gradient of (c).
Step 5.2):
in the refresh period
Figure BDA00037778503000001312
In (1),
Figure BDA00037778503000001313
representing a set of update periods. The agent obtains a partially updated expert trajectory epsilon l . To speed up the process of repeated model migration, the present invention utilizes meta-learning to record the scaling and translation of model migration. The meta-learning parameter in the period l is denoted as ω l . The meta learning process will
Figure BDA00037778503000001314
Is converted into
Figure BDA00037778503000001315
By passing
Figure BDA00037778503000001316
To obtain omega l . The goal of meta-learning is to make
Figure BDA00037778503000001317
Is similar to
Figure BDA00037778503000001318
The meta-update of an agent comprises two sub-phases, namely basic learner training and meta-learner training. In the first period, randomly extracting the expert track epsilon from the data set e,l Then sampling
Figure BDA00037778503000001319
Number of stripsTraining basic learning model according to the data, sampling
Figure BDA00037778503000001320
To train meta model learning, an
Figure BDA00037778503000001321
Temporary parameter θ' l From the parameter theta of the l-1 period l-1 The initialization is derived and used for fine tuning, updated as:
Figure BDA00037778503000001322
wherein l b Based on the learning rate of the base learner,
Figure BDA00037778503000001323
to gradient the loss function of the basis learner,
Figure BDA00037778503000001324
to freeze the parameter, θ l-1 Parameter of period l-1, ω l-1 Are meta learner parameters. Thus the parameter omega of the meta learner l The updating is as follows:
Figure BDA00037778503000001325
wherein l m Based on the learning rate of the base learner,
Figure BDA00037778503000001326
to solve the gradient of the loss function of the meta-learner,
Figure BDA00037778503000001327
is a freezing parameter, θ' l As a temporary parameter, ω l-1 Is the meta-learner parameter for the l-1 period. Thus agent parameter θ l Can be updated as:
Figure BDA00037778503000001328
wherein l m Based on the learning rate of the base learner,
Figure BDA00037778503000001329
to solve the gradient of the loss function of the meta-learner,
Figure BDA00037778503000001330
is a freezing parameter, θ' l As a temporary parameter, ω l Is the meta-learner parameter for the l period.
Step 5.3):
after completing the first training of the agent, the distributed agent according to the strategy
Figure BDA00037778503000001331
And making a migration decision based on the observed state until the agent enters the coverage of other infrastructures or until the (l + 1) th updating period, repeating the 2 nd stage by the agent for updating, wherein the updating process of the agent model can continuously imitate the expert strategy in a lightweight way, and the expert data set is effectively adapted while some known prior knowledge is kept.
Through the steps, the cooperative transmission provided by the invention is realized. Figures 4 and 5 show the efficiency of the present invention. The expert trajectory needs to be updated at intervals to prevent performance loss due to data obsolescence. In order to ensure the timeliness of the expert track. As shown in fig. 4, the runtime of the proposed LOS agent policy is drastically reduced as the expert trajectory is updated. By combining the precision performance of fig. 5, it is shown that the LOS strategy without migration needs to retrain the agent strategy to make a decision by directly using the updated data set, which wastes prior knowledge, reduces accuracy based on a smaller data set, and obviously makes the agent strategy of LOS more suitable for continuous update required by a long-term scene, with an accuracy rate ranging around 0.74.
Fig. 6-9 illustrate the average mobility and achievable quality of service distribution for 10 update periods at low and high workloads, respectively. According to the proposed mobility acquisition scheme, the value of the optimal mobility depends on the selected service provider and the request state. The gap in achievable quality of service between high and low loads is quite small as shown in fig. 8 and 9. Based on this, the service requester of the proposed LOS policy (including the LOS agent policy and the LOS expert policy) achieves the highest quality of service except for full offloading of the service, proving the high efficiency of the proposed LOS scheme.
The policy performance at different service data sizes is shown in fig. 10-17. Fig. 10 and 11 are average latencies with increasing service data size at low and high workloads, respectively. Where the delay of the LOS agent policy is increased from 0.91 to 3.07 seconds at low workload and from 116.85 to 426.34 seconds at high workload, just above the LOS expert policy. The LOS agent policy can accommodate different workloads by balancing communication and computational load with an adjustable migration ratio, making a reasonable trade-off between local and migrated devices. Figures 14 and 15 show the average service processing energy consumption percentages at different workloads, respectively. It is clear that the average energy consumption percentage increases with increasing amount of service data, and in conjunction with the decreasing payment performance in fig. 12 and 13, the LOS expert policy gets an optimal decision to reduce costs while suppressing the rate of energy consumption increase, providing a better expert trajectory for LOS agent policy emulation. Fig. 16 and 17 evaluate the average lifetime increase of service requesters at low and high workloads, respectively. The rapid drop in time-to-live gain shown in fig. 17 indicates that the communication consumption exceeds the computational consumption saved at high workloads, and the LOS agent strategy can achieve near-optimal time-to-live gain by trading off different workloads.
Different communication distance limitations are shown in fig. 18-25. Fig. 18 and 19 show latency performance at different workloads, respectively, where the LOS agent policy has significant advantages over other policies. Fig. 22 and 23 illustrate energy consumption at low and high workloads, respectively. The power consumption of the LOS agent increases at low workload and decreases at high workload, indicating that the service requester is more inclined to migrate services by leasing resources at high load, at the expense of slight latency and power consumption, as can also be explained in fig. 18 and 20. As shown in fig. 22, 23, 24, 25, the LOS agent policy can more flexibly adapt to different communicable restrictions by generating a global state-action distribution through an LOS expert policy that approximates an optimal result, reducing local power consumption and thus extending the life cycle of local devices.
Service performance evaluation with policies of different numbers of service classes is shown in fig. 26-33. Experiments were performed from 3 to 9 classes of service to evaluate the generalization of the algorithm in case of multiple classes of service. Under the same experiment condition, the more the cache content types are, the lower the cache hit rate between the communication devices is. Fig. 26, 27 illustrate that LOS expert policy integration can take into account the communication, computation, and buffering states of the communicable devices under the same conditions, resulting in minimal latency. The LOS agent policy with timely update policy has good emulation performance based on limited observation states. Figures 28, 29 evaluate the pay-cost performance, demonstrating a satisfactory tradeoff between cache content and migration portion for the pay-cost of LOS agent policy, the adaptation of LOS agent policy to increasing classes of service, and the ability of agent policy to accurately model expert decision distribution and obtain near-optimal decisions. As shown in fig. 30, 31, the average energy consumption of the LOS agent policy is more stable than other algorithms under different number of service classes. Under different workloads, there is only a small gap between LOS agent policies from 3 to 9 service classes, thereby improving the lifetime gains assessed in fig. 32 and 33. This is not only because the LOS agent policy considers the status of both the service requester and provider, but also because the LOS agent policy is able to obtain a global state fit based on the partially observed status. The performance gain of the LOS agent policy rises with the increase in the number of service classes, indicating that LOS can effectively adapt to multiple service class scenarios.
The above technical solutions only represent the technical solutions of the present invention, and are not the most perfect and accurate solutions. As technology innovations and the era move, more reasonable and efficient changes may be made to the solution. The exemplary embodiments were chosen and described in order to explain the principles and the application of the invention and to facilitate reference by researchers and technicians, as well as to understand and practice the details of the invention. It is intended that all such modifications and variations be included within the scope of the invention, which is determined by the following claims and their equivalents, be included within the scope of the invention.

Claims (7)

1. A real-time service migration method based on lightweight learning is characterized by comprising the following steps:
1) Constructing a dynamic edge network model; dividing regions according to the communication capacity of an infrastructure, wherein one region comprises a service provider and a service requester, setting that service migration is executed in discrete time slots, a user terminal can be used as the service provider while being used as the service requester, the service generated by the service requester can be partially migrated to other equipment for execution, the migration execution process of the service is divided into three steps of inputting, executing and outputting, and the service requester divides the migrated part into a local execution part and a migration execution part for parallel execution so as to disperse workload to improve the working efficiency and reduce the cost;
2) Resolving a service migration problem; respectively taking service delay and migration payment cost as indexes of service cooperative migration performance and cost to construct a dual-target optimization problem;
3) The infrastructure makes an optimal matching strategy based on the observed global state;
4) Transmitting the expert data set to the agent for the agent to train an agent strategy based on the imitation learning;
5) The intelligent agent trains the intelligent agent strategy based on the expert data set, and accelerates the model updating process based on the meta-learning strategy, so that the learning cost of the traditional neural network is eliminated, the traditional learning load is reduced, d time slots are set as an updating period, the expert track data set is updated in each updating period and provided for the distributed intelligent agent to learn, and each device needs to independently learn and independently update the strategy according to observable information to ensure the accuracy of the strategy.
2. The real-time service migration method based on lightweight learning according to claim 1, wherein: step 1) specifically comprises the steps of constructing service delay and transferring payment cost;
1.1 the service latency is as follows,
Figure FDA0003777850290000011
wherein
Figure FDA0003777850290000012
In order to perform the delay locally,
Figure FDA0003777850290000013
performing a time delay for the migration;
Figure FDA0003777850290000014
Figure FDA0003777850290000015
in order to calculate the time delay locally,
Figure FDA0003777850290000016
obtaining a packet delay for a local;
the migration execution time delay is as follows,
Figure FDA0003777850290000017
wherein
Figure FDA0003777850290000018
Is the communication delay of the two devices,
Figure FDA0003777850290000019
in order to calculate the time delay for the device,
Figure FDA00037778502900000110
obtaining time delay of data packets required for service;
1.2 the migration payment calculation process is as follows:
lease price for computing resources
Figure FDA00037778502900000111
Following state D j (t) a change, defined as:
Figure FDA00037778502900000112
wherein the parameter k represents a price coefficient for adjusting the available computing power
Figure FDA00037778502900000113
And surplus energy
Figure FDA00037778502900000114
Impact on unit rent;
rent function
Figure FDA00037778502900000115
The calculation method is as follows:
Figure FDA00037778502900000116
wherein R is comp (t) infrastructure available computing resources, 1 means that the remaining available energy is always sufficient, and κ is a price factor, so the corresponding energy consumption
Figure FDA00037778502900000117
Is calculated as
Figure FDA0003777850290000021
Wherein gamma is i (t) is a service S i (t) a mobility of the light-emitting element,
Figure FDA0003777850290000022
for local calculation of time delay, e comp In order to calculate the percentage of energy consumed per unit,
Figure FDA0003777850290000023
for local download delay, e down In order to download the percentage of energy consumed per unit,
Figure FDA0003777850290000024
for communication delay, e comm Is the percentage of energy consumed by the communication unit.
3. The real-time service migration method based on lightweight learning according to claim 1, wherein: step 2) the optimization problem P1 is
P1:
Figure FDA0003777850290000025
Figure FDA0003777850290000026
Wherein
Figure FDA0003777850290000027
Indicating the length of the execution slot, alpha ij (t) represents a service migration device decision variable, β ijh (t) represents a service data packet acquisition decision variable, γ i (T) is a service mobility decision variable, T i (t) time slot execution for device, P i (t) resource lease cost, S total number of service requests to be performed, P1 constraint on
Figure FDA0003777850290000028
Figure FDA0003777850290000029
Figure FDA00037778502900000210
Figure FDA00037778502900000211
Figure FDA00037778502900000212
C6:γ i (t)∈[0,1],
Figure FDA00037778502900000213
Constraint C1 ensures that the execution latency of a service cannot exceed its tolerable latency to guarantee the quality of experience for the user, where T i (t) is the service execution latency,
Figure FDA00037778502900000214
is K i Tolerable delay for class services; constraint C2 ensures that the migration portion of each service needs to be completed within a communicable time, where
Figure FDA00037778502900000215
A delay is performed for the migration of the service,
Figure FDA00037778502900000216
the time delay of communication between the two devices is obtained; constraint C3 ensures that each service provider should not exhaust its remaining energy toPreventing service interruption due to energy exhaustion, wherein
Figure FDA00037778502900000217
In order to provide the remaining energy for the equipment,
Figure FDA00037778502900000218
to perform energy consumption, D i (t) and
Figure FDA00037778502900000219
respectively representing a device and a set of devices; c4 defines the upper limit of the communication capacity of the device with the infrastructure, alpha ij (t) as a device migration decision variable, R ch (t) is the upper limit of the number of channels; constraint C5 constrains the binary decision variable value, α ij (t) and beta ijh (t) decision variables, n, for the device migration and service data packet acquisition modes, respectively t The total number of the equipment is; c6 illustrates the service mobility γ i (t) value range, constraint C7 indicates when mobility γ is present i (t) =0, when no service provider provides cooperation, that is, when no service provider provides cooperation
Figure FDA00037778502900000220
Figure FDA0003777850290000031
4. The real-time service migration method based on lightweight learning according to claim 1 or 3, wherein: step 3) the optimal matching strategy decomposes the optimization problem P1 into two sub-problems P4 and P5 as follows:
P4:
Figure FDA0003777850290000032
constrained by C3-C5;
P5:
Figure FDA0003777850290000033
constrained by C1, C2, C7.
5. The real-time service migration method based on lightweight learning according to claim 4, wherein: the step 4) specifically comprises the following steps:
step 4.1):
at the beginning of each time slot in an updating round, the matching times D of the equipment are initialized firstly j (t), visit, and number of service matches S i (t) visit is 0, wherein
Figure FDA0003777850290000034
Then, the preference value of each device is initialized to 0, i.e.
Figure FDA0003777850290000035
And initializing the tuning parameters
Figure FDA0003777850290000036
Is ∞;
step 4.2):
for each service request, firstly, the optimal mobility executed on each migration device is obtained, and the matching decision alpha is obtained according to the obtained optimal mobility ij (t) and beta ijh (t), lower limit of mobility
Figure FDA0003777850290000037
Comprises the following steps:
Figure FDA0003777850290000038
wherein
Figure FDA0003777850290000039
Is K i The tolerable delay of the service is such that,
Figure FDA00037778502900000310
is a homeThe time delay of the data packet is obtained,
Figure FDA00037778502900000311
for calculating the time delay locally when
Figure FDA00037778502900000312
Upper limit of mobility
Figure FDA00037778502900000313
Comprises the following steps:
Figure FDA00037778502900000314
wherein
Figure FDA00037778502900000315
In order to delay the communication between the two devices,
Figure FDA00037778502900000316
in order to wait for the communication to be delayed,
Figure FDA00037778502900000317
the time delay is obtained for the data packet,
Figure FDA00037778502900000318
in order to delay the time of communication,
Figure FDA00037778502900000319
to calculate the time delay. When the temperature is higher than the set temperature
Figure FDA00037778502900000320
Figure FDA00037778502900000321
Upper limit of mobility
Figure FDA00037778502900000322
Comprises the following steps:
Figure FDA00037778502900000323
wherein
Figure FDA00037778502900000324
In order to be able to tolerate the delay of the service,
Figure FDA00037778502900000325
in order to wait for the communication to be delayed,
Figure FDA00037778502900000326
the time delay is obtained for the data packet,
Figure FDA00037778502900000327
in order to delay the time of communication,
Figure FDA00037778502900000328
to calculate the time delay. Since the optimal time delay is that the local time delay and the migration time delay are equal, the optimal migration rate
Figure FDA00037778502900000329
Can be expressed as:
Figure FDA0003777850290000041
wherein
Figure FDA0003777850290000042
In order to obtain the packet delay locally,
Figure FDA0003777850290000043
in order to calculate the time delay locally,
Figure FDA0003777850290000044
in order to migrate the execution latency,
Figure FDA0003777850290000045
the time delay is obtained for the data packet,
Figure FDA0003777850290000046
in order to delay the time of communication,
Figure FDA0003777850290000047
in order to calculate the time delay,
Figure FDA0003777850290000048
representing the actual execution delay of the task;
if it is
Figure FDA0003777850290000049
γ i (t) =0, and
Figure FDA00037778502900000410
Figure FDA00037778502900000411
the mobility was obtained as follows:
Figure FDA00037778502900000412
step 4.3):
for each attempted migration device, if the constraints C1-C7 are satisfied, the benefit U will be ij (t) adding to service S in descending order i (t) preference list, otherwise γ i Benefit U when (t) =0 ij (t) adding to a preference list; obtaining a priority value for each service request based on all preference values
Figure FDA00037778502900000413
Is a stand forMaximum preference value for served;
step 4.4):
for service S in service request set i (t) to the device set
Figure FDA00037778502900000414
And executing matching operation, wherein the specific execution process is as follows: from the set
Figure FDA00037778502900000415
Is in S i (t) finding a suitable implementation for the matching process, defining an expected value U ij (t) is
Figure FDA00037778502900000416
And
Figure FDA00037778502900000417
and if so, the sum
Figure FDA00037778502900000418
Then S i (t) migration to device D j (t) and returning the matching result, otherwise, matching the adjusting parameter delta j (t) then needs to be updated to
Figure FDA00037778502900000419
Wherein
Figure FDA00037778502900000420
For service S i (t) a preference value for (t),
Figure FDA00037778502900000421
as a device D j Preference value of (t), U ij (t) is a desired value.
Step 4.5):
if no matching result is returned in step 4.4), an update operation is performed to update the list of tuning variables for the device that has not been previously matched
Figure FDA00037778502900000422
The adjustment factor is updated to min { delta, delta j (t) }, where δ is the adjustment factor initialized to ∞, Δ j (t) adjusting all accessed service preference values to adjustment variables
Figure FDA00037778502900000423
Adjusting all vehicle preferences to
Figure FDA00037778502900000424
And all the adjustment variables Delta j (t) update to Δ j (t)-δ。
6. The method according to claim 1, wherein the method comprises the following steps: step 5) the updating steps of the intelligent agent strategy are as follows:
step 5.1):
obtaining an initial expert demonstration data set epsilon 0 And expert strategy
Figure FDA00037778502900000425
Thereafter, each agent obtains an initial agent model by training a neural network, and the agent network estimates actions based on observed states
Figure FDA00037778502900000426
And fitting the observed states and the estimated motion profile according to a loss function
Figure FDA00037778502900000427
And expert strategy pi e (a, s) to train its strategy, loss function
Figure FDA00037778502900000428
The following:
Figure FDA0003777850290000051
wherein
Figure FDA0003777850290000052
Representing a smart agent strategy,. Pi e (as) denotes an expert policy, a denotes an actual action, s denotes an observed state,
Figure FDA0003777850290000053
a predicted action is represented by the predicted movement,
Figure FDA0003777850290000054
representing a freezing parameter, theta 0 Which is indicative of the initial parameters of the device,
Figure FDA0003777850290000055
expressing the expectation; therefore, the updating process of the parameters is as follows:
Figure FDA0003777850290000056
wherein iota b Represents the learning rate of the underlying learner,
Figure FDA0003777850290000057
representing loss function
Figure FDA0003777850290000058
A gradient of (a);
step 5.2):
in the update period
Figure FDA0003777850290000059
In (1),
Figure FDA00037778502900000510
representing a set of update periods, the agent obtains a partially updated expert trajectory epsilon l By meta-learningRecord the scaling and transformation of model migration, with the meta-learning parameter in period l denoted as ω l The Yuan learning process will
Figure FDA00037778502900000511
Is converted into
Figure FDA00037778502900000512
By passing
Figure FDA00037778502900000513
To obtain omega l The goal of meta-learning is to make
Figure FDA00037778502900000514
Is similar to
Figure FDA00037778502900000515
Step 5.3):
after the first training of the agent, the distributed agent follows the strategy
Figure FDA00037778502900000516
Based on the observed state, a migration decision is made until he enters the coverage of other infrastructure or until the (l + 1) th update period, and the agent repeats step 5.2 for updating.
7. The method according to claim 6, wherein the method comprises: the element updating of the agent comprises two sub-stages, namely basic learner training and element learner training; in the first period, an expert trajectory epsilon is randomly extracted from the data set e,l Then sampling
Figure FDA00037778502900000517
Training basic learning model by using bar data, sampling
Figure FDA00037778502900000518
To train meta model learning, an
Figure FDA00037778502900000519
Temporary parameter θ' l From the parameter theta of the l-1 period l-1 The initialization is derived and used for fine tuning, updated as:
Figure FDA00037778502900000520
wherein iota b Based on the learning rate of the base learner,
Figure FDA00037778502900000521
to gradient the loss function of the base learner,
Figure FDA00037778502900000522
to freeze the parameter, θ l-1 Parameter of period l-1, ω l-1 Is a meta learner parameter; thus the parameter omega of the meta-learner l The updating is as follows:
Figure FDA00037778502900000523
wherein iota m Based on the learning rate of the base learner,
Figure FDA00037778502900000524
to solve the gradient of the loss function of the meta-learner,
Figure FDA00037778502900000525
is a freezing parameter, θ' l As a temporary parameter, ω l-1 A meta learner parameter for a period of l-1; thus agent parameter θ l Can be updated as:
Figure FDA00037778502900000526
wherein iota m Based on the learning rate of the base learner,
Figure FDA00037778502900000527
to solve the gradient of the loss function of the meta-learner,
Figure FDA00037778502900000528
is a freezing parameter, θ' l As a temporary parameter, ω l Is the meta-learner parameter for the l period.
CN202210921760.0A 2022-08-02 2022-08-02 Lightweight learning-based live service migration method Active CN115484304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210921760.0A CN115484304B (en) 2022-08-02 2022-08-02 Lightweight learning-based live service migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210921760.0A CN115484304B (en) 2022-08-02 2022-08-02 Lightweight learning-based live service migration method

Publications (2)

Publication Number Publication Date
CN115484304A true CN115484304A (en) 2022-12-16
CN115484304B CN115484304B (en) 2024-03-19

Family

ID=84422715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210921760.0A Active CN115484304B (en) 2022-08-02 2022-08-02 Lightweight learning-based live service migration method

Country Status (1)

Country Link
CN (1) CN115484304B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012149788A1 (en) * 2011-09-30 2012-11-08 华为技术有限公司 Service establishment method and system, radio network controller and user terminal
CN110275758A (en) * 2019-05-09 2019-09-24 重庆邮电大学 A kind of virtual network function intelligence moving method
CN111858009A (en) * 2020-07-30 2020-10-30 航天欧华信息技术有限公司 Task scheduling method of mobile edge computing system based on migration and reinforcement learning
CN111885155A (en) * 2020-07-22 2020-11-03 大连理工大学 Vehicle-mounted task collaborative migration method for vehicle networking resource fusion
CN113114722A (en) * 2021-03-17 2021-07-13 重庆邮电大学 Virtual network function migration method based on edge network
US20210250838A1 (en) * 2018-10-22 2021-08-12 Huawei Technologies Co., Ltd. Mobile handover method and related device
CN113434212A (en) * 2021-06-24 2021-09-24 北京邮电大学 Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN113543074A (en) * 2021-06-15 2021-10-22 南京航空航天大学 Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012149788A1 (en) * 2011-09-30 2012-11-08 华为技术有限公司 Service establishment method and system, radio network controller and user terminal
US20210250838A1 (en) * 2018-10-22 2021-08-12 Huawei Technologies Co., Ltd. Mobile handover method and related device
CN110275758A (en) * 2019-05-09 2019-09-24 重庆邮电大学 A kind of virtual network function intelligence moving method
CN111885155A (en) * 2020-07-22 2020-11-03 大连理工大学 Vehicle-mounted task collaborative migration method for vehicle networking resource fusion
CN111858009A (en) * 2020-07-30 2020-10-30 航天欧华信息技术有限公司 Task scheduling method of mobile edge computing system based on migration and reinforcement learning
CN113114722A (en) * 2021-03-17 2021-07-13 重庆邮电大学 Virtual network function migration method based on edge network
CN113543074A (en) * 2021-06-15 2021-10-22 南京航空航天大学 Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
CN113434212A (en) * 2021-06-24 2021-09-24 北京邮电大学 Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDREAS POLZE;: "Timely Virtual Machine Migration for Pro-active Fault Tolerance", 2011 14TH IEEE INTERNATIONAL SYMPOSIUM ON OBJECT/COMPONENT/SERVICE-ORIENTED REAL-TIME DISTRIBUTED COMPUTING WORKSHOPS, 21 April 2011 (2011-04-21) *
刘坤: "基于5G的星地融合核心网的设计与仿真实现", 信息科技辑, 15 March 2022 (2022-03-15) *
唐伦;周钰;谭颀;魏延南;陈前斌;: "基于强化学习的5G网络切片虚拟网络功能迁移算法", 电子与信息学报, no. 03, 15 March 2020 (2020-03-15) *

Also Published As

Publication number Publication date
CN115484304B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
Chen et al. Deep reinforcement learning for computation offloading in mobile edge computing environment
Wang et al. Dynamic UAV deployment for differentiated services: A multi-agent imitation learning based approach
Zhou et al. Incentive-driven deep reinforcement learning for content caching and D2D offloading
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
Xie et al. Adaptive online decision method for initial congestion window in 5G mobile edge computing using deep reinforcement learning
CN110290011A (en) Dynamic Service laying method based on Lyapunov control optimization in edge calculations
Wu et al. Multi-agent DRL for joint completion delay and energy consumption with queuing theory in MEC-based IIoT
CN111262940A (en) Vehicle-mounted edge computing application caching method, device and system
CN114143891A (en) FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network
Wang et al. Distributed reinforcement learning for age of information minimization in real-time IoT systems
CN113822456A (en) Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN115278708A (en) Mobile edge computing resource management method for federal learning
CN113573320A (en) SFC deployment method based on improved actor-critic algorithm in edge network
CN116489226A (en) Online resource scheduling method for guaranteeing service quality
CN116185523A (en) Task unloading and deployment method
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
Nguyen et al. Intelligent blockchain-based edge computing via deep reinforcement learning: solutions and challenges
Huang et al. Reinforcement learning for cost-effective IoT service caching at the edge
Qadeer et al. Hrl-edge-cloud: Multi-resource allocation in edge-cloud based smart-streetscape system using heuristic reinforcement learning
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
CN117459112A (en) Mobile edge caching method and equipment in LEO satellite network based on graph rolling network
Chen et al. Distributed task offloading game in multiserver mobile edge computing networks
CN111901833A (en) Unreliable channel transmission-oriented joint service scheduling and content caching method
CN115484304A (en) Real-time service migration method based on lightweight learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant