CN115514769A

CN115514769A - Satellite elastic internet resource scheduling method, system, computer equipment and medium

Info

Publication number: CN115514769A
Application number: CN202211125448.7A
Authority: CN
Inventors: 罗志勇; 林天豪; 黄澳
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2022-12-23
Anticipated expiration: 2042-09-14
Also published as: CN115514769B; ZA202305873B

Abstract

The invention provides a method, a system, computer equipment and a medium for scheduling satellite elastic internet resources, which are characterized in that a time delay sensitive satellite elastic internet framework is established through a multi-to-multi mode based on an LEO satellite and a user terminal, a minimized time delay optimization model is established according to the time delay sensitive satellite elastic internet framework and a satellite elastic internet resource scheduling model, the minimized time delay optimization model is converted into a corresponding Markov decision model, and the Markov decision model is solved to obtain a resource scheduling strategy.

Description

Satellite elastic internet resource scheduling method, system, computer equipment and medium

Technical Field

The invention relates to the technical field of satellite resource scheduling, in particular to a satellite elastic internet resource scheduling method and system based on deep reinforcement learning, computer equipment and a storage medium.

Background

With the continuous expansion of internet users, user equipment accessing a network is also increasing, which brings about greater performance requirements on time delay and energy consumption balance. However, due to the limitations of cost and technical conditions, in many global areas, such as deserts, deep seas, forests and other complex natural geographic environments, regional network coverage cannot be realized by deploying ground base stations in a large scale, and the satellite internet becomes a reliable way for ensuring efficient communication in such areas.

The traditional satellite network systems are independent, the networking mechanism and related protocols have remarkable heterogeneous characteristics, so that a serious chimney forest phenomenon is caused, and the utilization efficiency of network space resources is limited to a great extent. Therefore, how to establish an efficient, flexible and agile satellite internet architecture, how to efficiently unload tasks and how to reasonably schedule and allocate limited resources to different task demands becomes an important research direction for improving the utilization efficiency of world-wide converged network resources.

The existing satellite internet resource scheduling method mainly sinks the computing capacity to an LEO satellite, arranges an edge computing technology of an MEC server on the LEO satellite, and draws the physical distance between the user and the MEC server to achieve better time delay and energy consumption balance; however, the existing methods for solving the optimization problem are mainly divided into two types: 1) The problem of task scheduling is solved by using a Hungarian method; 2) The optimization problem is proved to be a convex optimization problem and solved by KKT conditions. Although the two methods can solve the problem of internet dangerous resource scheduling to a certain extent, each method has corresponding application defects: when the Hungarian method is used for solving the task scheduling problem, although the time delay is reduced compared with the traditional algorithm, the method only considers the application scene that one server is responsible for calculating tasks, does not consider the problem that under the satellite communication scene, calculation and storage resources are scarce, and the load is unbalanced easily; the optimization problem is proved to be a convex optimization problem and solved by using the KKT condition, only the high efficiency of the solving process is considered, and the influence caused by task queuing time delay due to limited resources and large task quantity is not considered; that is, the existing solution does not consider the application difference between the satellite communication scene and the ground internet scene, and cannot really and effectively schedule resources based on the actual scenes of very scarce resources, limited calculation and storage resources, large task amount, long queuing delay and the like, and the practicability is low.

Therefore, it is urgently needed to provide a reasonable resource scheduling method which can fully utilize the limited resources on the satellite, provide low-delay, high-quality, high-security and high-security services for users in various regions around the world, and solve the problem of high delay sensitivity.

Disclosure of Invention

The invention aims to provide a satellite elastic internet resource scheduling method, which considers the problem of resource scheduling among edge servers under the edge calculation of a satellite network in a plurality of modes, completes the construction of a delay sensitive satellite elastic internet framework based on an SDN/NFV technology and a TSN (time series network) security protocol, obtains a corresponding delay optimization model, solves a resource scheduling strategy by adopting a delay optimization algorithm constructed based on a deep reinforcement learning algorithm framework, effectively makes up for the application defect of the conventional satellite resource scheduling scheme, can fully utilize the calculation and storage resources of a satellite, provides services with priority, optimizes the average delay performance and realizes the real and effective load balance.

In order to achieve the above objects, it is necessary to provide a satellite flexible internet resource scheduling method, system, computer device and storage medium for solving the above technical problems.

In a first aspect, an embodiment of the present invention provides a method for scheduling satellite flexible internet resources, where the method includes the following steps:

establishing a time delay sensitive satellite elastic Internet architecture based on a many-to-many mode of an LEO satellite and a user side; the LEO satellite corresponds to an MEC server;

establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture;

establishing a minimum time delay optimization model according to the satellite elastic internet resource scheduling model;

converting the minimized time delay optimization model into a corresponding Markov decision model;

and solving the Markov decision model to obtain a resource scheduling strategy.

In a second aspect, an embodiment of the present invention provides a satellite flexible internet resource scheduling system, where the system includes:

the architecture construction module is used for establishing a time delay sensitive satellite elastic Internet architecture based on a many-to-many mode of an LEO satellite and a user side; the LEO satellite corresponds to an MEC server;

the first modeling module is used for establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture;

the second modeling module is used for establishing a minimized time delay optimization model according to the satellite elastic internet resource scheduling model;

the model conversion module is used for converting the minimized time delay optimization model into a corresponding Markov decision model;

and the strategy solving module is used for solving the Markov decision model to obtain a resource scheduling strategy.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above method.

The application provides a satellite flexible internet resource scheduling method, a satellite flexible internet resource scheduling system, computer equipment and a storage medium, and the method realizes the technical scheme of establishing a delay sensitive satellite flexible internet framework based on a multi-to-multi mode of an LEO satellite and a user terminal, establishing a satellite flexible internet resource scheduling model according to the delay sensitive satellite flexible internet framework, establishing a minimized delay optimization model according to the satellite flexible internet resource scheduling model, converting the minimized delay optimization model into a corresponding Markov decision model, and solving the Markov decision model to obtain a resource scheduling strategy. Compared with the prior art, the satellite elastic internet resource scheduling method can comprehensively and effectively optimize problem modeling based on a real application scene with limited resources and the influence of task queuing delay, not only can fully utilize computing resources and storage resources of a satellite and avoid performance influence caused by queuing delay, but also can meet diversified requirements of a user side and improve service quality based on division of task priorities, can also improve computing efficiency of scheduling and distributing mass business resources based on a delay optimization algorithm specifically constructed by a deep reinforcement learning algorithm, further realizes intelligent and efficient satellite elastic internet resource scheduling with priority service, improved delay performance and load balance guarantee, and has strong generalization capability and high practical value.

Drawings

Fig. 1 is a schematic view of an application scenario of a satellite flexible internet resource scheduling method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for scheduling satellite flexible Internet resources according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a time delay sensitive satellite flexible internet architecture according to an embodiment of the present invention;

FIG. 4 is a pseudo code diagram of a DDRA algorithm designed based on a TD3 architecture of a deep reinforcement learning algorithm in an embodiment of the present invention;

FIG. 5 is a diagram illustrating relevant parameters of a satellite flexible Internet resource scheduling model according to an embodiment of the present invention;

FIG. 6 is a graph of the impact of discount rate on DDRA algorithm convergence in an embodiment of the present invention;

FIG. 7 is a convergence comparison diagram of the DDRA algorithm and the SAC algorithm in the embodiment of the present invention;

FIG. 8 is a graph illustrating the effect of the calculated task amount under different algorithms on the average delay performance under the allocation of computing resources when the task data amount is uniformly distributed in the embodiment of the present invention;

FIG. 9 is a graph comparing the effect of task computation amount under different algorithms on average time-lapse performance under the allocation of computing resources when the task data amount obeys normal distribution in the embodiment of the present invention;

FIG. 10 is a graph illustrating the effect of task computation workload under different algorithms on the average time-lapse performance under the allocation of computing resources when the task computation workload is uniformly distributed in the embodiment of the present invention;

FIG. 11 is a graph illustrating a comparison of the influence of task data under different algorithms on the average delay performance under the allocation of computing resources when the task computation workload obeys normal distribution in the embodiment of the present invention;

FIG. 12 is a graph comparing the influence of the number of users under different algorithms on the average time-lapse performance under the allocation of computing resources when the task data amount and the task calculation amount are subject to normal distribution in the embodiment of the present invention;

FIG. 13 is a schematic structural diagram of a satellite flexible Internet resource scheduling system according to an embodiment of the present invention;

fig. 14 is an internal structural view of a computer device in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments, and it is obvious that the embodiments described below are part of the embodiments of the present invention, and are only used for illustrating the present invention, but not for limiting the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The satellite elastic internet resource scheduling method provided by the invention can be applied to a situation that a user side and an MEC server shown in figure 1 meet satellite elastic internet resource scheduling in a many-to-many mode, and can realize intelligent efficient satellite elastic internet resource scheduling with priority service, time delay performance improvement and load balance guarantee; the following embodiments will explain the satellite flexible internet resource scheduling method of the present invention in detail.

In one embodiment, as shown in fig. 2, there is provided a satellite flexible internet resource scheduling method, including the following steps:

s11, establishing a time delay sensitive satellite elastic Internet architecture based on a many-to-many mode of an LEO satellite and a user side; the LEO satellite corresponds to one MEC server, and communication interaction is carried out among the MEC servers to realize load balance; the time delay sensitive satellite elastic internet architecture can be understood as a satellite internet architecture shown in fig. 3 which is established based on an SDN/NFV technology and combined with the basic idea of IEEE 802.1 Qcc in TSN, an MEC server of each LEO satellite provides task unloading service for a ground data node (user end), and the satellite internet architecture is made to be matched with diversified task requirements of the user end as much as possible based on the functions of satellite resource configuration, reason configuration, network configuration and the like of world fusion network real-time management; specifically, the step of establishing the time delay sensitive satellite flexible internet architecture based on the many-to-many mode of the LEO satellite and the user side includes:

and virtualizing computing resources and storage resources of an MEC server corresponding to each LEO satellite based on an SDN/NFV technology, and combining multiple delay related protocols by combining with a TSN (secure storage network) security protocol to establish the delay sensitive satellite elastic Internet architecture on the basis of the principle of minimizing delay optimization target management satellite resource configuration, route forwarding and network configuration.

S12, establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture; the satellite flexible internet resource scheduling model can be understood as a time delay sensitive satellite flexible internet architecture based on the step S11, a scene that a plurality of LEO satellites cover multiple ground data nodes (user terminals) is considered on a data plane, and when the user terminals unload tasks, a scheduling model for determining a resource allocation strategy mainly comprises a communication model, a task model and a calculation model; the task model is established based on the consideration of task priority, and the calculation model can be established based on a preset task dividing principle and mainly comprises a local unloading task calculation model and an MEC unloading task calculation model; specifically, the step of establishing a satellite flexible internet resource scheduling model according to the delay-sensitive satellite flexible internet architecture includes:

acquiring the data transmission rate uploaded to the MEC server by each user side, and constructing the communication model according to the data transmission rate based on a many-to-many mode; the communication model is represented as:

in the formula ,

wherein ,

and

respectively representing a user side set and an MEC server set;

r _i,j (t)、I _i,j 、h _i,j(t) and s_i,j Respectively representing the user terminal u in the time slot t _i Offloading tasks to MEC Server b _j Transmission delay, transmission energy consumption, transmission rate, inter-cell interference power, channel gain, and linear distance; w represents the channel bandwidth; sigma ² Representing a noise power of the user equipment; z is a radical of _i (t) indicates the ue u in the time slot t _i Generated task Q _i (t) size of data volume; c represents the speed of light; p is a radical of _i (t) denotes the time gap ttuser u _i A transmission power of the transmission signal;

it should be noted that, because the data size after the task is processed is very small, the communication model of the present invention does not always consider the energy consumption and the time delay during downloading; in addition, based on a multi-mode scene with multiple user terminals and multiple LEO nodes, under the support of the existing conditions, the switching time delay of satellite communication can be ignored, and because the user terminals are in an area with a longer distance from the LEO satellite, the distance from the user terminals to the MEC server can be approximately considered to be the same as the distance from the user terminals to the LEO satellite, and the following task model is established under the condition of only paying attention to the 'end-edge' model without considering the influence caused by cloud;

based on a load balancing principle, constructing the task model according to task calculated amount, task data amount and task priority; the task model is represented as:

Q _i (t)＝{ω _i (t),z _i (t),pri _i (t)}

wherein ,Q_i (t) indicates the ue u in the time slot t _i A generated task; omega _i (t) represents task Q _i (t) the required amount of computation, i.e. the CPU frequency required to complete the task; z is a radical of _i (t) represents task Q _i (t) data size; pri _i (t) represents the task Q _i (t) priority, and pri _i (t)∈[1,2,…,PN]PN represents the number of priority levels;

dividing each client task into a local unloading task and an MEC unloading task, and respectively constructing a corresponding local unloading task calculation model and an MEC unloading task calculation model; the local offload task computation model may be understood as a computation model of processing delay and energy consumption when the local offload task is processed locally by the ue, and may be represented as:

wherein ,

and

respectively represent the user ends u _i To local task Q _i ^L Of (A)Time delay and corresponding energy consumption; f. of _i ^L Representing a user side u _i Local CPU frequency of (d); rho _i A power coefficient representing the energy consumed per CPU cycle;

the MEC unloading task calculation model can be understood as an MEC server processing delay model established based on the consideration that a plurality of user terminals may unload tasks to the same MEC server and different priorities of the tasks unloaded by different user terminals exist, the task processing delay of the user terminals for unloading the tasks to the MEC server is considered, and the queuing delay of the tasks is considered, and the MEC server processing delay model is expressed as follows:

wherein ,

indicating the ue u in the time slot t _i Offloading tasks to MEC Server b _j The processing delay of (2);

representing MEC Server b _j The CPU frequency of (1);

representing MEC Server b _j Allocation of MEC offload task Q within time gap t _i ^E Calculating the resource proportion;

indicating priority pri _i MEC off-load task Q _i ^E Average queuing delay of; it should be noted that the energy consumption of the processing task of the MEC server is not considered here;

as described above

The calculation model of (2) is related to the queuing model actually adopted, such as the task queuing model is assumed to be non-preemptiveA limited queuing model (M/M/N queue), wherein tasks with the same priority are processed by a first-come-first-serve principle; and the arrival rate compliance parameter of the task arrival queue with any priority in any time gap is assumed to be lambda _i (t) poisson distribution, where the processing time of the MEC server obeys exponential distribution with a parameter μ (t), where the simultaneous storage space of multiple servers is sufficiently large; according to the setting of the task model, the priority has PN, and then the corresponding priority is pri _i Task Q of _i Average queuing delay of (1):

in the formula ,

the total arrival rate is:

and the constraint conditions include:

wherein ,O_j Representing offload to MEC Server b _j A set of computing tasks.

In particular, of the above-mentioned different prioritiesTask Q _i The calculation process of the average queuing delay at MEC server j can be understood as follows:

when the priority pri =1 of the task,

when the priority pri =2 of the task,

the different levels of task priority can be analogized to each other, when the priority pri of the task = PN,

and carrying out corresponding task queuing delay calculation.

S13, establishing a minimized time delay optimization model according to the satellite elastic internet resource scheduling model; the minimum time delay optimization model is a time delay optimization model which is established by taking the average processing time delay of all tasks generated in a preset time range corresponding to the minimum time gap set as an optimization target based on the dynamic property generated by the node tasks; specifically, the step of establishing a minimized time delay optimization model according to the satellite flexible internet resource scheduling model includes:

calculating the average time delay of task processing of each time gap within a preset time range according to the satellite elastic internet resource scheduling model; the average time delay of task processing in each time slot can be calculated according to the transmission energy consumption and MEC unloading task calculation model in the communication model, for example, the total time delay generated by all tasks in each time slot t is calculated as follows:

wherein l represents the total number of the user terminals for unloading the tasks to the MEC server in the time gap t;

based on the total time delay obtained above, i.e.Calculating the average time delay of task processing of each user terminal as

Averaging the average time delay of the task processing in each time interval to obtain the average time delay of the task processing in a preset time range, wherein the average time delay of the task processing in the preset time range corresponding to the time interval set T is as follows:

constructing a minimized time delay optimization model by taking the task processing average time delay within the minimized preset time range as an optimization target; the objective function of the minimized time delay optimization model is represented as:

in the formula ,

wherein d (t) represents the total time delay generated by all tasks in the time gap t;

and

respectively represent the user terminal u in the time slot t _i Offloading tasks to MEC Server b _j Transmission delay and processing delay of (1); l represents the total number of the user terminals; k represents a calculation resource proportion matrix distributed to different user sides by each MEC server;

the constraint condition of the minimized time delay optimization model is expressed as follows:

wherein ,O_j Representing offload to MEC Server b _j A set of computing tasks;

indicating the ue u in the time slot t _i Offloading tasks to MEC Server b _j Transmission energy consumption of (2); t represents the total number of time slots; e _i Representing a user side u _i Upper limit of transmission energy consumption;

representing MEC Server b _j Allocation to user terminal u in time slot t _i Task Q of _i Calculating the resource proportion;

s14, converting the minimized time delay optimization model into a corresponding Markov decision model; wherein a Markov Decision Process (MDP) model is understood to be a 4-element tuple

And S represents a state space, A represents an action space, R represents a reward function, and χ ∈ [0, 1']Representing a discount coefficient; specifically, the step of converting the minimized latency optimization model into a corresponding markov decision model includes:

constructing a state space of the Markov decision model according to the environment state of each time gap; the environmental state of each time slot is represented as:

wherein s (t) represents the environmental state of the time gap t; ω (t), z: (t) and pri (t) respectively represent the calculated amount, data amount and priority of all tasks in the time gap t;

representing the unloading strategy of the user side;

constructing an action space of the Markov decision model according to the Agent action of the Agent in each time interval; the Agent actions of the agents of each time slot are expressed as:

a(t)＝κ(t)

wherein a (t) represents the Agent action of the Agent in the time gap t; k (t) represents the proportion of computing resources distributed to different user sides by each MEC server under the time gap t;

constructing a reward function of the Markov decision model according to the reward of the Agent of each time interval; the Agent action reward is represented as:

wherein r (t) represents the Agent action reward of the Agent for time gap t;

it should be noted that, for the conventional MDP problem, the cumulative reward function is to be maximized, and the objective function of the minimized delay optimization model according to the present invention is to minimize the average delay, so that, on the basis of the objective function, when the constraint condition is satisfied, the inverse number of the delay is selected as the reward function, and when the constraint condition is not satisfied, the reward function is set to a minimum value.

S15, solving the Markov decision model to obtain a resource scheduling strategy; in order to ensure the high efficiency and accuracy of the solution, the DDRA algorithm is preferably established on the basis of a reinforcement learning algorithm TD3 framework, and a resource scheduling strategy is output through a trained neural network model; specifically, the step of solving the markov decision model to obtain the resource scheduling policy includes:

and constructing a time delay optimization DDRA algorithm based on a reinforcement learning algorithm TD3 framework, and solving the Markov decision model through the time delay optimization DDRA algorithm to obtain the resource scheduling strategy.

When solving the markov decision model based on the reinforcement learning algorithm TD3, 6 networks need to be trained as follows: the method comprises the following steps of an Actor network, a Target Actor network, a first Critic network, a first Target Critic network, a second Critic network and a second Target Critic network, wherein the training process comprises the following steps:

the parameter of the Actor network is phi, the input is the current training environment state s (t), and the output is the current Agent strategy pi (a (t) | s (t);, i.e. the action probability distribution of the current time slot; the corresponding parameter of the Target Actor network is phi ', the input is the current implementation environment state s ' (t), and the output is the current Agent strategy pi (a ' (t) | s ' (t); phi '), namely the action probability distribution of the current time slot;

the parameter of the first Critic network is θ ₁ The input is the current Agent strategy pi (a (t) | s (t); phi) and the current training environment state s (t), and the output is the Q function of the action a (t) taken under the current state

Namely, the accumulated expected value of taking specific resource allocation actions under the current satellite communication environment state; the parameter of the corresponding first Target Critic network is θ' ₁ The inputs are the current Agent policy pi (a ' (t) | s ' (t); phi ') and the implementation environment state s ' (t), and the output is a Q function of the action a ' (t) taken at the current state

Namely, the accumulated expected value of taking specific resource allocation actions under the current satellite communication environment state;

the parameter of the second Critic network is θ ₂ The input is the current Agent strategy pi (a (t) | s (t); phi) and the current training environment state s (t), and the output is the Q function of the action a (t) taken under the current state

Namely, the accumulated expected value of taking specific resource allocation actions under the current satellite communication environment state; the parameter of the corresponding second Target Critic network is θ' ₂ The inputs are the current Agent policy pi (a ' (t) | s ' (t); phi ') and the implementation environment state s ' (t), and the output is a Q function of the action a ' (t) taken at the current state

the pseudo code of the process of training the neural network by adopting the time delay optimization DDRA algorithm constructed based on the reinforcement learning algorithm TD3 algorithm frame is shown in FIG. 4, and comprises the following steps:

1) Interacting with the environment by using an Actor network, and storing a result tuple { s (t), a (t), r (t), s (t + 1) } obtained by each step of interaction into a cache;

2) Randomly taking batches of tuples { s (t), a (t), r (t), s (t + 1) } from the cache to calculate a (t + 1) and Q functions; wherein, taking action a (t + 1) in the current state:

correspondingly, the Q function Q in the current state:

where σ' is the variance, c is the upper limit, and γ is the Q function update rate;

3) Calculating and updating parameter theta of first Critic network ₁ And a parameter θ of a second Critic network ₂ (ii) a The corresponding updating method is as follows:

parameters of the first critical network

Parameters of the second Critic network

4) Updating parameter phi of the Actor network, parameter phi ' of the Target Actor network and parameter theta ' of the first Target Critic network at preset step interval d ' ₁ And parameter θ 'of a second Target Critic network' ₂ (ii) a The corresponding parameter updating method comprises the following steps:

parameters of Actor network

Parameter φ ' ← τ φ + (1- τ) φ ' of Target Actor network '

Parameter theta 'of Target Critic1 network' ₁ ←τθ ₁ +(1-τ)θ′ ₁

Parameter theta 'of Target Critic2 network' ₂ ←τθ ₂ +(1-τ)θ′ ₂

wherein ,

eta is the Actor network update rate, and tau is the Target network update rate;

5) Repeating the above steps until the neural network converges.

In order to make a task unloading decision, the Actor network needs to train the first Critic network, the second Critic network, the Target Actor network, the first Target Critic network and the second Target Critic network according to the steps and through multiple times of training and combining experience in a cache; and finally, jointly determining a resource scheduling strategy by adopting the Target Actor network, the first Target Critic network and the second Target Critic network which are obtained through training.

According to the method, the resource scheduling problem among edge servers under satellite network edge computing is considered, the construction of a delay sensitive satellite elastic internet framework is completed based on an SDN/NFV technology and a TSN security protocol, a corresponding delay optimization model is obtained, a method for obtaining a resource scheduling strategy is solved by adopting a delay optimization algorithm constructed based on a deep reinforcement learning algorithm framework, comprehensive and effective optimization problem modeling is carried out based on a real application scene with limited resources and the influence of task queuing delay, not only can computing resources and storage resources of a satellite be fully utilized, performance influence caused by queuing delay is avoided, but also diversified requirements of a user side are met based on division of task priorities, service quality is improved, the delay optimization algorithm constructed based on the deep reinforcement learning algorithm in a targeted mode is also used, the computing efficiency of scheduling and distributing mass service resources is improved, further, intelligent and efficient satellite elastic internet resource scheduling with priority service, delay performance is improved, load balance is guaranteed, application defects of an existing satellite resource scheduling scheme are effectively overcome, and the method has strong generalization capability and high practical value.

In order to verify the application effect and performance of the DDRA scheme, the application also implements a related simulation comparison experiment, and the specific simulation process is as follows:

the simulation platform adopts Python 3.9, 3 LEO satellites with the height of 784km fly over a square area of 1200m × 1200m, 24 (modifiable) user terminals are randomly distributed on the ground by default, and each user terminal can only unload tasks to an MEC server of one LEO satellite or unload the tasks locally. Because the altitude is far greater than the ground area, the distance between each user terminal and the MEC server is approximately considered to be the altitude of the LEO satellite, and since the considered LEO satellites form a galaxy, the loss of channel switching and the influence caused by a communication window are negligible, that is, the user can communicate with the LEO satellite at the moment, and the channel gain can be obtained in advance through a perception technology. In addition, the transmission power of the user terminal is set to 23dBm, the channel bandwidth is 20MHz, and the channel model is selected as a free space fading channel model. For the task parameters, the calculation amount, the data amount, the priority and the possible energy consumption are mainly considered; during calculation, task unloading decision of the last time slot needs to be considered, and the amount of calculation resources which can be provided by each LEO satellite in each time slot is assumed to be a random value within a range; in the DDTO algorithm, through careful adjustment, all the neural networks are divided into 4 layers which are respectively 1 input layer, 2 hidden layers and 1 output layer, wherein hidden layer neurons of an actor network are respectively 2048 and 1024, and hidden layer neurons of a critic network are respectively 1024 and 512; in the training parameters of the model, the learning rate was 0.001 and the discount factor was 0.75. In addition, the remaining parameter settings are detailed in FIG. 5;

simulation A: based on the continuity of the motion space, selecting a SAC algorithm with performance similar to that of the DDRA algorithm, and analyzing and comparing the convergence speed of the DDRA algorithm and other reinforcement learning algorithms; the SAC algorithm is obtained by improving based on the DQN algorithm, the characteristic of preventing over-estimation is achieved on the calculation of the Critic network, and the difference is that the SAC introduces strategy entropy into a reward function, namely, the Agent of the intelligent Agent is encouraged to increase exploration while maximizing reward, so that the Agent is allowed to search a global optimal solution rather than a local optimal solution as much as possible, further more time is required for training, and the training process is not efficient;

simulation B: analyzing and comparing the optimized time delay results of the DDRA algorithm and other resource scheduling algorithms (a SAC algorithm and a local optimization algorithm are selected); the local optimization algorithm (marked as LOA) is based on a model without considering the influence of priority and queuing delay, so that the optimization problem can be proved to be a convex optimization problem, and the Lagrangian equation of the problem is solved through a KKT condition to obtain an optimal solution; meanwhile, in LOA, substituting the optimal solution into the objective function in the minimum time delay optimization model of the invention to obtain the average time delay;

simulation C: based on the fact that the mean value of the data volume is 3.5Mb, the calculated amount respectively obeys the scenes of uniform distribution and normal distribution, and the influence of the calculated amount of the research task on the average time ductility performance under the allocation of the calculation resources is researched;

and (3) simulation D: based on the fact that the mean value of the calculated amount is 5.5Gcycle, the data amount obeys the scenes of uniform distribution and normal distribution respectively, and the influence of the task data amount on the average time-delay performance under the allocation of the calculation resources is researched;

simulation E: the influence of the number of users on the average time-lapse performance under the allocation of computing resources is researched on the basis that the average values of the task computing capacity and the task data capacity are 5 Gcycles and 1Mb respectively and are subjected to the condition of normal distribution.

Based on the above simulation experiment, the following results were obtained:

the results of the analysis shown in FIGS. 6-7 were obtained from simulation A: FIG. 6 compares the impact of discount rate on DDRA algorithm convergence; in deep reinforcement learning, the discount factor is an important hyper-parameter in the markov decision process. By setting the discount factors to be 0.05, 0.75 and 0.95 in sequence, it is found that the influence is caused when the discount factor is too large or too small, and when the discount factors are 0.05 and 0.95, the average delay fluctuates greatly, which indicates that the system is not stable in these 2 cases, and especially when the discount factor is 0.05, a large peak appears around epicode = 50; when the discount factor is 0.75, the convergence of the system is stable and the convergence speed is moderate, so that the method is suitable for serving as a default hyper-parameter of a model when the task calculation amount and the task data amount are analyzed next. Based on this, it can be seen that the discount factor must be chosen reasonably, otherwise it will cause the system to be unstable or converge to an inappropriate value; fig. 7 compares the convergence of different reinforcement learning algorithms under SMTOM, and finds that the DDRA has converged to about 1.8s when epsode =100, while the SAC algorithm converges to almost 2s after epsode = 5000; it can be seen that the convergence speed of the DDRA algorithm is significantly faster than that of the SAC algorithm, and the final average delay obtained by convergence is also lower than that of the SAC algorithm, that is, the DDRA has higher calculation efficiency and better optimization result in the second scenario under SMTOM, which is very important in the satellite communication scenario with scarce resources.

Based on simulation B and simulation C, the analysis results shown in fig. 8-9 were obtained: 8-9 can conclude that the average delay of the system tends to increase when the task computation amount increases, wherein the rising trend of the SAC algorithm is less stable. Meanwhile, the DDRA algorithm and the local optimization algorithm LOA provided by the application have better performance, and the average time delay is lower than that of the SAC algorithm. In addition, when the calculated amount is less than 4.5Gcycle, the performance of the DDRA algorithm is poorer than that of the LOA, but when the calculated amount is greater than 4.5Gcycle, the performance of the DDRA algorithm is better than that of the LOA, and because the LOA does not consider the queuing delay problem, when the calculated amount of the tasks is larger, the processing time of each task is increased, so that the queuing delay is increased, and if the tasks are still locally optimized, the delay performance is reduced; based on the method, the advantages of the DDRA algorithm in the application scene with larger calculation amount are embodied, and the advantages of the SMTMO in the satellite communication scene are further embodied.

When the data amount in fig. 8 is subject to uniform distribution, the average delay of the SAC algorithm is 37.8% higher and the average delay of the DDRA algorithm is 2.7% lower than that of the LOA algorithm. It can be seen in fig. 9 that the difference between the average delay performance of each algorithm when the data amount obeys normal distribution and the average delay performance of each algorithm when the data amount obeys uniform distribution is not large, the average delay of each algorithm is relatively stable when the calculated amount is relatively low, and LOA is superior to dda in delay performance to SAC; after the calculated amount is increased, the SAC algorithm has a result with a large variance, and may not converge to an optimal solution under the limitation of training times, and the DDRA algorithm exceeds LOA when the delay performance is expressed, so that the superiority of the DDRA algorithm under SMTMO under high task calculated amount is reflected.

Based on simulation B and simulation D, the analysis results shown in FIGS. 10-11 were obtained: from fig. 10 to fig. 11, it can be concluded that the task data volume has little influence on the task offloading performance, and as the data volume increases, the average delay of various algorithms approximately shows a smaller increase trend, because the data volume only affects the transmission delay and the transmission delay usually occupies a smaller amount in the whole delay, and under LOA, the optimal value obtained through gradient has no relation with the task data volume. Among the three algorithms, the SAC algorithm has large jitter, and the LOA and the ddr a show a steady rising trend in comparison. In addition, when the calculated amount in fig. 10 is uniformly distributed, compared to LOA, the average delay of SAC algorithm is lower by 1.9%, and the average delay of random algorithm is lower by 11.5%; as can be seen from fig. 11, when the data amount is normally distributed, the average delay of the SAC algorithm is higher by 2.1% and the average delay of the random algorithm is lower by 11.2% compared to the LOA; therefore, under the higher task calculation amount, the SAC algorithm is similar to the LOA average delay performance, and the DDRA provided by the invention is about 11% better than the LOA average delay performance.

Based on simulation B and simulation E, the analysis results shown in fig. 12 were obtained: from fig. 12, it can be seen that the average delay under each algorithm gradually increases with the number of users, and particularly, the average delay increases with a somewhat exponential increase in the local optimization algorithm. Comparing the three algorithms, wherein the average delay performance of the DDRA algorithm is the best on average, and the average delay under different user numbers in the simulation is 6.5s; the SAC algorithm is slightly inferior to the DDRA algorithm in performance, and the average time delay under different user numbers in the simulation is 6.8s; the average time delay of the local optimization algorithm in the simulation is 13.1s under different user numbers, because the influence caused by queuing time delay is neglected, the influence is not large when the user number is small, but if the user number is increased, the average time delay is also increased rapidly, so that the time delay performance under the multi-user task number is greatly reduced, and the user experience is reduced. That is, in general, the average delay of the ddr a algorithm is 4.4% lower than that of the SAC algorithm and 50.4% lower than that of the local optimization algorithm for different numbers of ues.

Based on the results of the simulation experiments, the satellite elastic Internet architecture and the satellite elastic Internet resource scheduling model are established for the resource scheduling problem in the satellite elastic Internet scene, the objective of optimizing the average time delay under the energy limitation condition is provided on the basis, and then the technical scheme that the DDRA algorithm is provided by utilizing the reinforcement learning algorithm TD3 framework to solve the non-convex optimization problem can obviously reduce the average time delay of user side task unloading, and real effective satellite resource scheduling is realized.

It should be noted that, although the steps in the above-described flowcharts are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders.

In one embodiment, as shown in fig. 13, there is provided a satellite-resilient internet resource scheduling system, the system comprising:

the architecture construction module 1 is used for establishing a time delay sensitive satellite elastic internet architecture based on a many-to-many mode of an LEO satellite and a user side; the LEO satellite corresponds to an MEC server;

the first modeling module 2 is used for establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture;

the second modeling module 3 is used for establishing a minimized time delay optimization model according to the satellite elastic internet resource scheduling model;

a model conversion module 4, configured to convert the minimized delay optimization model into a corresponding markov decision model;

and the strategy solving module 5 is used for solving the Markov decision model to obtain a resource scheduling strategy.

For specific limitations of a satellite flexible internet resource scheduling system, reference may be made to the above limitations of a satellite flexible internet resource scheduling method, which are not described herein again. All or part of each module in the satellite flexible internet resource scheduling system can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 14 shows an internal structure diagram of a computer device in one embodiment, and the computer device may be a terminal or a server. As shown in fig. 14, the computer apparatus includes a processor, a memory, a network interface, a display, and an input device, which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a satellite-resilient internet resource scheduling method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those of ordinary skill in the art that the architecture shown in FIG. 14 is merely a block diagram of a portion of the architecture associated with aspects of the present application and is not intended to limit the computing devices to which aspects of the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a similar arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method.

In summary, the satellite flexible internet resource scheduling method and system provided by the embodiments of the present invention realize that a delay-sensitive satellite flexible internet architecture is established based on many-to-many modes of LEO satellites and user terminals, and after a satellite flexible internet resource scheduling model is established according to the delay-sensitive satellite flexible internet architecture, a minimized delay optimization model is established according to the satellite flexible internet resource scheduling model, the minimized delay optimization model is converted into a corresponding markov decision model, and the markov decision model is solved to obtain a technical scheme of a resource scheduling policy.

The embodiments in this specification are described in a progressive manner, and all the same or similar parts of the embodiments are directly referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. It should be noted that, the technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several preferred embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for those skilled in the art, without departing from the technical principle of the present invention, several improvements and substitutions can be made, and these improvements and substitutions should also be regarded as the protection scope of the present application. Therefore, the protection scope of the present patent application shall be subject to the protection scope of the claims.

Claims

1. A satellite flexible Internet resource scheduling method is characterized by comprising the following steps:

and solving the Markov decision model to obtain a resource scheduling strategy.

2. The method for scheduling satellite flexible internet resources according to claim 1, wherein the step of establishing the delay-sensitive satellite flexible internet architecture based on the many-to-many mode of the LEO satellite and the user side comprises:

based on SDN/NFV technology, computing resources and storage resources of an MEC server corresponding to each LEO satellite are virtualized, multiple delay related protocols are combined by combining with a TSN (secure transmission network) security protocol, and the delay sensitive satellite elastic Internet architecture is established according to the principle that the target management satellite resource configuration, route forwarding and network configuration are optimized in a minimized delay mode.

3. The satellite flexible internet resource scheduling method of claim 1, wherein the satellite flexible internet resource scheduling model comprises a communication model, a task model and a computation model; the computation model comprises a local unloading task computation model and an MEC unloading task computation model.

4. The method as claimed in claim 3, wherein the step of building a satellite flexible internet resource scheduling model according to the delay-sensitive satellite flexible internet architecture comprises:

in the formula ,

wherein ,

and

respectively representing a user side set and an MEC server set;

r _i,j (t)、I _i,j 、h _i,j(t) and s_i,j Respectively representing the user terminal u in the time slot t _i Offloading tasks to MEC Server b _j Transmission delay, transmission energy consumption, transmission rate, inter-cell interference power, channel gain, and linear distance; w represents the channel bandwidth; sigma ² Representing a noise power of the user equipment; z is a radical of _i (t) indicates the ue u in the time slot t _i Generated task Q _i (t) data size; c represents the speed of light; p is a radical of _i (t) denotes the time gap ttuser u _i A transmission power of the transmission signal;

based on a load balancing principle, according to the task calculated amount, the task data amount and the task priority, constructing the task model; the task model is represented as:

Q _i (t)＝{ω _i (t),z _i (t),pri _i (t)}

wherein ,Q_i (t) represents the user terminal u in the time slot t _i A generated task; omega _i (t) represents task Q _i (t) the amount of computation required; z is a radical of formula _i (t) represents task Q _i (t) size of data volume; pri _i (t) represents the task Q _i (t) priority, and pri _i (t)∈[1,2,…,PN]PN represents the number of priority levels;

dividing each client task into a local unloading task and an MEC unloading task, and respectively constructing a corresponding local unloading task calculation model and an MEC unloading task calculation model; the local offload task computation model is represented as:

wherein ,

and

respectively represent the user ends u _i For local task Q _i ^L Processing delay and corresponding energy consumption of (1); f. of _i ^L Indicating user u _i Local CPU frequency of (d); rho _i A power coefficient representing the energy consumed per CPU cycle;

the MEC offload task computation model is represented as:

wherein ,

representing MEC Server b _j The CPU frequency of (1);

indicates priority of pri _i MEC off-load task Q _i ^E Average queuing delay.

5. The satellite flexible internet resource scheduling method of claim 1, wherein the step of building a minimized delay optimization model according to the satellite flexible internet resource scheduling model comprises:

calculating the average time delay of task processing of each time gap within a preset time range according to the satellite elastic internet resource scheduling model;

averaging the task processing average time delay of each time interval to obtain the task processing average time delay within a preset time range;

in the formula ,

and

respectively representing the user terminal u in the time slot t _i Offloading tasks to MEC Server b _j Transmission delay and processing delay; l represents the total number of the user terminals(ii) a K represents a calculation resource proportion matrix distributed to different user sides by each MEC server;

wherein ,O_j Representing offload to MEC Server b _j A set of computing tasks;

representing MEC Server b _j To user terminal u in time slot t _i Task Q of _i The computing resource proportion of (1).

6. The satellite flexible internet resource scheduling method of claim 5, wherein said step of converting said minimized delay optimization model into a corresponding markov decision model comprises:

wherein s: (t) represents the environmental state of the time gap t; ω (t), z (t) and pri (t) respectively represent the calculation amount, data amount and priority of all tasks within the time slot t; gamma ray _i ^j (t) indicates an offloading policy of the user side;

a(t)＝κ(t)

wherein a (t) represents the Agent action of the Agent for the time gap t; k (t) represents the proportion of computing resources distributed to different user sides by each MEC server under the time gap t;

constructing a reward function of the Markov decision model according to the reward of the Agent of each time interval; the Agent action reward is expressed as:

where r (t) represents the Agent action reward for time gap t.

7. The satellite flexible internet resource scheduling method of claim 1, wherein the step of solving the markov decision model to obtain the resource scheduling policy comprises:

8. A satellite-resilient internet resource scheduling system, the system comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.