CN115514769B

CN115514769B - Satellite elastic Internet resource scheduling method, system, computer equipment and medium

Info

Publication number: CN115514769B
Application number: CN202211125448.7A
Authority: CN
Inventors: 罗志勇; 林天豪; 黄澳
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2023-06-06
Anticipated expiration: 2042-09-14
Also published as: ZA202305873B; CN115514769A

Abstract

The invention provides a satellite elastic Internet resource scheduling method, a system, computer equipment and a medium, wherein a delay sensitive satellite elastic Internet architecture is established based on multiple pairs and modes of LEO satellites and a user side, a satellite elastic Internet resource scheduling model is established according to the delay sensitive satellite elastic Internet architecture, a minimum delay optimizing model is established according to the satellite elastic Internet resource scheduling model, the minimum delay optimizing model is converted into a corresponding Markov decision model, the Markov decision model is solved, a resource scheduling strategy is obtained, calculation and storage resources of satellites can be fully utilized, queuing delay performance influence is avoided, diversified requirements of the user side are met, service quality is improved, calculation efficiency of resource scheduling is improved based on a deep reinforcement learning algorithm, and further resource scheduling with priority service, lower delay and balanced load is achieved, and the method has strong generalization capability and high practical value.

Description

Satellite elastic Internet resource scheduling method, system, computer equipment and medium

Technical Field

The invention relates to the technical field of satellite resource scheduling, in particular to a satellite elastic internet resource scheduling method, system, computer equipment and storage medium based on deep reinforcement learning.

Background

With the continuous expansion of internet users, user equipment accessing to the network is also proliferated, and accordingly, larger time delay and energy consumption balanced performance requirements are brought. However, due to the limitation of cost and technical conditions, the regional network coverage cannot be realized in a mode of large-scale deployment of ground base stations in many complex natural geographic environments such as deserts, deep sea, forests and the like, and the satellite internet becomes a reliable mode for guaranteeing high-efficiency communication in such regions.

The traditional satellite network systems are independent, the heterogeneous characteristics of networking mechanisms and related protocols are obvious, a serious chimney stand phenomenon is caused, and the utilization efficiency of network space resources is limited to a great extent. How to build an efficient, flexible and agile satellite internet architecture, and how to efficiently offload tasks and reasonably schedule limited resources to different task demands becomes an important research direction for improving the utilization efficiency of the world convergence network resources.

The existing satellite internet resource scheduling method mainly aims at achieving better time delay and energy consumption balance by sinking computing power to LEO satellites and arranging an edge computing technology of an MEC server on the LEO satellites to shorten the physical distance between the MEC server and a user; however, existing methods for solving this optimization problem are mainly divided into two types: 1) The problem of task scheduling is solved by using a Hungary method; 2) The optimization problem is proved to be a convex optimization problem and solved with the KKT condition. Although the two methods can solve the problem of internet dangerous resource scheduling to a certain extent, the two methods have corresponding application defects: if the problem of task scheduling is solved by using the Hungary method, the time delay is reduced compared with the traditional algorithm, but only one server is considered to be responsible for calculating the application scene of the task, and the problem of unbalanced load is easily caused because calculation and storage resources are scarce in the satellite communication scene is not considered; the optimization problem is proved to be a convex optimization problem, and the KKT condition is used for solving, so that only the high efficiency of the solving process is considered, and the influence caused by the queuing delay of the task due to limited resources and large task quantity is not considered; that is, the existing solution does not consider the application distinction between the satellite communication scene and the terrestrial internet scene, and cannot truly and effectively schedule resources based on the actual scenes such as very scarce resources, limited computing and storage resources, long queuing delay of large amount of tasks, and the like, so that the practicability is low.

Therefore, it is needed to provide a reasonable resource scheduling method capable of fully utilizing limited resources on satellites, providing services with low delay, high quality and high security for users around the world, and solving the problem of high delay sensitivity.

Disclosure of Invention

The invention aims to provide a satellite elastic Internet resource scheduling method, which is characterized in that under multiple modes, the problem of resource scheduling among edge servers under satellite network edge calculation is considered, the construction of a time delay sensitive satellite elastic Internet architecture is completed based on SDN/NFV technology and TSN security protocol, a corresponding time delay optimization model is obtained, and a time delay optimization algorithm constructed based on a deep reinforcement learning algorithm architecture is adopted to solve and obtain a resource scheduling strategy, so that the application defect of the existing satellite resource scheduling scheme is effectively overcome, the calculation and storage resources of satellites can be fully utilized, prioritized service is provided, average time delay performance is optimized, and real effective load balancing is realized.

In order to achieve the above objective, it is necessary to provide a satellite elastic internet resource scheduling method, system, computer device and storage medium for the above technical problems.

In a first aspect, an embodiment of the present invention provides a satellite elastic internet resource scheduling method, where the method includes the following steps:

Based on LEO satellite and user end many-to-many mode, establishing time delay sensitive satellite elastic Internet architecture; the LEO satellite corresponds to one MEC server;

establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture;

establishing a minimum time delay optimization model according to the satellite elastic Internet resource scheduling model;

converting the minimized time delay optimization model into a corresponding Markov decision model;

and solving the Markov decision model to obtain a resource scheduling strategy.

In a second aspect, an embodiment of the present invention provides a satellite elastic internet resource scheduling system, where the system includes:

the architecture construction module is used for establishing a time delay sensitive satellite elastic Internet architecture based on the LEO satellite and the many-to-many mode of the user side; the LEO satellite corresponds to one MEC server;

the first modeling module is used for establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture;

the second modeling module is used for establishing a minimum time delay optimization model according to the satellite elastic Internet resource scheduling model;

the model conversion module is used for converting the minimized time delay optimization model into a corresponding Markov decision model;

And the strategy solving module is used for solving the Markov decision model to obtain a resource scheduling strategy.

In a third aspect, embodiments of the present invention further provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

The method realizes that a time delay sensitive satellite elastic Internet architecture is established based on multiple pairs of multimode of LEO satellites and user terminals, and after a satellite elastic Internet resource scheduling model is established according to the time delay sensitive satellite elastic Internet architecture, a minimum time delay optimizing model is established according to the satellite elastic Internet resource scheduling model, and the minimum time delay optimizing model is converted into a corresponding Markov decision model, and the Markov decision model is solved, so that the technical scheme of a resource scheduling strategy is obtained. Compared with the prior art, the satellite elastic Internet resource scheduling method can perform comprehensive and effective optimization problem modeling based on the real application scene with limited resources and task queuing delay influence, can fully utilize the calculation resources and storage resources of the satellite, avoid performance influence caused by queuing delay, can meet diversified requirements of a user side based on task priority division, improve service quality, can be based on a delay optimization algorithm constructed in a targeted manner by a deep reinforcement learning algorithm, improve the calculation efficiency of massive service resource scheduling distribution, further realize intelligent efficient satellite elastic Internet resource scheduling with priority service, improve delay performance and ensure load balance, and has strong generalization capability and high practical value.

Drawings

Fig. 1 is a schematic diagram of an application scenario of a satellite elastic internet resource scheduling method in an embodiment of the invention;

FIG. 2 is a schematic flow chart of a satellite elastic Internet resource scheduling method in an embodiment of the invention;

FIG. 3 is a schematic diagram of a time delay sensitive satellite elastic Internet architecture in accordance with an embodiment of the present invention;

FIG. 4 is a pseudo code schematic diagram of a DDRA algorithm designed based on a TD3 architecture of a deep reinforcement learning algorithm in an embodiment of the present invention;

FIG. 5 is a schematic diagram of relevant parameters of a satellite elastic Internet resource scheduling model according to an embodiment of the invention;

FIG. 6 is a graph of the effect of discount rate on DDRA algorithm convergence in an embodiment of the present invention;

FIG. 7 is a graph comparing the convergence of the DDRA algorithm and the SAC algorithm in an embodiment of the present invention;

FIG. 8 is a graph showing the effect of task computation amounts under different algorithms on average time delay performance under computing resource allocation when task data amounts are subjected to uniform distribution in an embodiment of the present invention;

FIG. 9 is a graph comparing the effect of task computation amounts under different algorithms on the average time delay performance under computing resource allocation when task data amounts are subjected to normal distribution in the embodiment of the invention;

FIG. 10 is a graph showing the effect of task computation amounts under different algorithms on the average time delay performance under computing resource allocation when the task computation amounts are subject to uniform distribution in the embodiment of the present invention;

FIG. 11 is a graph showing the effect of task data on the average delay performance under the distribution of computing resources under different algorithms when the task computation amount is subject to normal distribution in the embodiment of the invention;

FIG. 12 is a graph showing the effect of the number of users under different algorithms on the average delay performance under the allocation of computing resources when the task data amount and the task computation amount follow normal distribution in the embodiment of the present invention;

FIG. 13 is a schematic diagram of a satellite elastic Internet resource scheduling system according to an embodiment of the present invention;

fig. 14 is an internal structural view of a computer device in the embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantageous effects of the present application more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples, and it should be understood that the examples described below are only illustrative of the present invention and are not intended to limit the scope of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The satellite elastic Internet resource scheduling method provided by the invention can be applied to the satellite elastic Internet resource scheduling scene which is shown in the figure 1 and is satisfied by the user side and the MEC server in a multi-pair multi-mode, and the intelligent high-efficiency satellite elastic Internet resource scheduling with priority service, time delay improving performance and load balancing guaranteeing is realized; the following embodiments will describe the satellite elastic internet resource scheduling method of the present invention in detail.

In one embodiment, as shown in fig. 2, a satellite elastic internet resource scheduling method is provided, which includes the following steps:

s11, establishing a time delay sensitive satellite elastic Internet architecture based on a LEO satellite and a many-to-many mode of a user; the LEO satellite corresponds to one MEC server, and communication interaction among the MEC services is realized, so that load balancing is realized; the time delay sensitive satellite elastic internet architecture can be understood as a satellite internet architecture shown in fig. 3 based on SDN/NFV technology and combined with the basic idea of IEEE 802.1 Qcc in TSN, and the MEC server of each LEO satellite provides task unloading service for a ground data node (user terminal), and the functions of satellite resource configuration, reason configuration, network configuration and the like are managed in real time based on a converged network, so that the satellite internet architecture meets the diversified task requirements of the user terminal as much as possible; specifically, the step of establishing the delay-sensitive satellite elastic internet architecture based on the many-to-many mode of the LEO satellite and the user terminal includes:

based on SDN/NFV technology, the computing resources and storage resources of MEC servers corresponding to each LEO satellite are virtualized, and various delay related protocols are combined by combining TSN security protocols, so that the principle of minimum delay optimization target management satellite resource configuration, route forwarding and network configuration is established, and the delay sensitive satellite elastic Internet architecture is established.

S12, establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture; the satellite elastic internet resource scheduling model can be understood as a scheduling model for determining a resource allocation strategy when a user side is used for unloading tasks by considering a scene that a plurality of LEO satellites cover a multi-ground data node (user side) on a data plane based on a time delay sensitive satellite elastic internet architecture in the step S11, and mainly comprises a communication model, a task model and a calculation model; the task model is established based on consideration of task priority, and the calculation model can be established based on a preset task division principle and mainly comprises a local task unloading calculation model and an MEC task unloading calculation model; specifically, the step of establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture includes:

acquiring data transmission rates of the MEC servers uploaded by the user terminals, and constructing the communication model according to the data transmission rates based on the many-to-many mode; the communication model is expressed as:

in the formula ,

wherein ,

and />

Respectively representing a user side set and an MEC server set; / >

r _i,j (t)、I _i,j 、h _i,j(t) and s_i,j Respectively represent the user end u in the time interval t _i Offloading tasks to MEC server b _j Transmission delay, transmission energy consumption, transmission rate, inter-cell interference power, channel gain, and linear distance; w represents a channel bandwidth; sigma (sigma) ² Representing the noise power of the user equipment; z _i (t) represents the user terminal u in the time slot t _i Task Q generated _i The data size of (t); c represents the speed of light; p is p _i (t) represents a time slot t and a user terminal u _i Transmission power of the transmission signal;

it should be noted that, because the data volume of the processed task is very small, the communication model of the present invention always does not consider the energy consumption and time delay during downloading; in addition, based on a multi-mode scene of a multi-user terminal and a multi-LEO node, under the support of the existing condition, the switching time delay of satellite communication is negligible, and as the user terminal is in a region which is far away from the LEO satellite, the distance from the user terminal to the MEC server can be approximately considered to be the same as the distance from the LEO satellite, and the following task model is built under the condition that only an end-edge model is focused and the influence caused by cloud is not considered;

based on a load balancing principle, constructing a task model according to the task calculation amount, the task data amount and the task priority; the task model is expressed as:

Q _i (t)＝{ω _i (t),z _i (t),pri _i (t)}

wherein ,Q_i (t) represents the user terminal u in the time slot t _i A generated task; omega _i (t) represents task Q _i (t) the amount of computation required, i.e., the CPU frequency required to complete the task; z _i (t) represents task Q _i The data size of (t); pri (pri) _i (t) represents the task Q _i (t) priority, and pri _i (t)∈[1,2,…,PN]PN represents the number of priorities;

dividing each user end task into a local offloading task and an MEC offloading task, and respectively constructing a corresponding local offloading task calculation model and an MEC offloading task calculation model; the local task offloading calculation model may be understood as a processing delay and energy consumption calculation model when the user side device locally processes the local task offloading, and may be expressed as:

wherein ,

and />

Respectively represent user end u _i For local task Q _i ^L Processing time delay and corresponding energy consumption; f (f) _i ^L Representing user end u _i Is a local CPU frequency of (b); ρ _i A power coefficient representing the energy consumed by each CPU cycle;

the MEC offloading task calculation model may be understood as a MEC server processing delay model established by considering not only the task processing delay of offloading tasks to the MEC server by a user side, but also the queuing delay of tasks, based on the consideration that there may be a plurality of user sides offloading tasks to the same MEC server and different priorities of tasks offloaded by different user sides, which is expressed as:

wherein ,

indicating the user end u in the time slot t _i Offloading tasks to MEC server b _j Is a processing delay of (1); />

Representing MEC server b _j Is a CPU frequency of (2); />

Representing MEC server b _j Assigning MEC offload tasks Q in time slots t _i ^E Is calculated according to the resource proportion; />

Indicating priority as pri _i MEC offload task Q _i ^E Average queuing delay of (a); it should be noted that, the energy consumption of the processing task of the MEC server is not considered here;

above-mentioned

The calculation model of (1) is related to an actually adopted queuing model, such as a task queuing model is assumed to be a non-preemptive limit queuing model (M/M/N queue), and tasks with the same priority are processed according to a first-come first-serve principle; and assuming that the arrival rate of any priority task arriving at the queue for any time slot obeys a parameter lambda _i The Poisson distribution of (t), the processing time of the MEC server obeys the exponential distribution of the parameter mu (t), and the storage space is large enough when a plurality of servers exist; according to the setting of the task model, the priority has PN numbers, and then the corresponding priority is pri _i Task Q of (2) _i Average queuing delay of (c):

in the formula ,

the total arrival rate is:

and the constraints include:

/>

wherein ,O_j Representation offloading to MEC server b _j A collection of computing tasks.

Specifically, tasks Q of different priorities _i The calculation of the average queuing delay at MEC server j can be understood as:

when the priority pri=1 of the task,

when the priority pri=2 of the task,

different levels of task priority can be analogized to, when the priority of the task pri=pn,

and (5) performing corresponding task queuing delay calculation.

S13, establishing a minimum time delay optimization model according to the satellite elastic Internet resource scheduling model; the minimum time delay optimization model is a time delay optimization model which is established based on the dynamic property generated by the node tasks and takes the average processing time delay of all tasks generated in a preset time range corresponding to the minimum time gap set as an optimization target; specifically, the step of establishing a minimum time delay optimization model according to the satellite elastic internet resource scheduling model includes:

calculating the task processing average time delay of each time interval in a preset time range according to the satellite elastic Internet resource scheduling model; the average time delay of task processing in each time interval can be calculated according to the transmission energy consumption and the MEC unloading task calculation model in the communication model, for example, the total time delay generated by all the tasks in each time interval t is calculated as follows:

Wherein, l represents the total number of clients for offloading tasks to the MEC server in the time slot t;

based on the obtained total time delay, the average time delay of task processing of each user terminal can be calculated as

Averaging the task processing average time delays of all the time slots to obtain the task processing average time delay of a preset time range, wherein the task processing average time delay of the preset time range corresponding to the time slot set T is as follows:

taking the task processing average time delay of the minimum preset time range as an optimization target, and constructing a minimum time delay optimization model; the objective function of the minimum delay optimization model is expressed as:

in the formula ,

wherein d (t) represents the total time delay generated by all tasks in the time interval t;

and />

Respectively represent the user end u in the time interval t _i Offloading tasks to MEC server b _j Transmission delay and processing delay of (a); l represents the total number of clients; kappa represents a computing resource proportion matrix allocated to different user terminals by each MEC server;

the constraint condition of the minimum time delay optimization model is expressed as follows:

wherein ,O_j Representation offloading to MEC server b _j A set of computing tasks on;

indicating the user end u in the time slot t _i Offloading tasks to MEC server b _j Is used for the transmission energy consumption of the (a); t represents the total number of time slots; e (E) _i Representing a userTerminal u _i An upper transmission energy consumption limit of (2); />

Representing MEC server b _j Assigned to the user end u in time slots t _i Task Q of (2) _i Is calculated according to the resource proportion;

s14, converting the minimized time delay optimization model into a corresponding Markov decision model; wherein the Markov decision process (Markov Decision Processes, MDP) model is understood to be a 4-element tuple

And S represents a state space, A represents an action space, R represents a reward function, χ ε [0,1 ]]Representing discount coefficients; specifically, the step of converting the minimized time delay optimization model into a corresponding markov decision model includes:

constructing a state space of the Markov decision model according to the environmental states of each time slot; the environmental state of each time gap is expressed as:

wherein s (t) represents the environmental state of the time gap t; ω (t), z (t) and pri (t) represent the calculated amount, data amount and priority of all tasks within the time slot t, respectively;

representing an unloading strategy of a user side;

according to the Agent actions of the agents in each time interval, constructing an action space of the Markov decision model; the Agent actions of each time interval are expressed as follows:

a(t)＝κ(t)

Wherein a (t) represents an Agent action of the time slot t; kappa (t) represents the proportion of computing resources allocated to different clients by each MEC server in a time slot t;

constructing a reward function of the Markov decision model according to Agent rewards of the agents in each time interval; the Agent action rewards are expressed as:

wherein r (t) represents an Agent action reward for the time slot t;

it should be noted that, for the conventional MDP problem, the cumulative reward function is maximized, and the objective of the minimum delay optimization model according to the present invention is to minimize the average delay, so that, on the basis of the objective function, the opposite number of delays is selected as the reward function, and the reward function is set to a minimum value when the constraint is not satisfied.

S15, solving a Markov decision model to obtain a resource scheduling strategy; the method for solving the Markov decision model can be various existing methods, in order to ensure the high efficiency and the accuracy of solving, in the embodiment, a DDRA algorithm is preferably established on the basis of a reinforcement learning algorithm TD3 framework, and a resource scheduling strategy is output through a trained neural network model; specifically, the step of solving the markov decision model to obtain the resource scheduling policy includes:

And constructing a time delay optimization DDRA algorithm based on a reinforcement learning algorithm TD3 framework, and solving the Markov decision model through the time delay optimization DDRA algorithm to obtain the resource scheduling strategy.

When solving the Markov decision model based on the reinforcement learning algorithm TD3, the following 6 networks need to be trained: the training process comprises the following steps of:

the parameter of the Actor network is phi, the input is the current training environment state s (t), and the output is the current Agent strategy pi (a (t) s (t); phi, namely the action probability distribution of the current time slot; the corresponding Target Actor network parameter is phi ', the input is the current implementation environment state s ' (t), the output is the current Agent strategy pi (a ' (t) to s ' (t); phi '), namely the action probability distribution of the current time interval;

the parameter of the first Critic network is θ ₁ Inputs are the current Agent policy pi (a (t) |s (t); phi) and the current training environment state s (t), and outputs are the Q function of action a (t) taken in the current state

I.e., the accumulated expected value of taking a particular resource allocation action in the current satellite communication environment state; the parameter of the corresponding first Target Critic network is θ' ₁ Inputs are the current Agent policy pi (a ' (t) |s ' (t); phi ') and the implementation environment state s ' (t), and outputs are the Q function of the action taken in the current state a ' (t)/(t)>

I.e., the accumulated expected value of taking a particular resource allocation action in the current satellite communication environment state;

the parameter of the second Critic network is θ ₂ Inputs are the current Agent policy pi (a (t) |s (t); phi) and the current training environment state s (t), and outputs are the Q function of action a (t) taken in the current state

I.e., the accumulated expected value of taking a particular resource allocation action in the current satellite communication environment state; the parameter of the corresponding second Target Critic network is θ' ₂ Inputs are the current Agent policy pi (a ' (t) |s ' (t); phi ') and the implementation environment state s ' (t), and outputs are the Q function of the action taken in the current state a ' (t)/(t)>

the pseudo code of the process of training the neural network by adopting the time delay optimization DDRA algorithm constructed based on the reinforcement learning algorithm TD3 algorithm frame is shown in figure 4, and the method comprises the following steps:

1) Using an Actor network to interact with the environment, and storing result tuples { s (t), a (t), r (t), s (t+1) } obtained by each step of interaction into a cache;

2) Randomly taking batches of tuples { s (t), a (t), r (t), s (t+1) } from the cache, and calculating a (t+1) and Q functions; wherein action a (t+1) is taken in the current state:

correspondingly, the Q function Q in the current state:

where σ' is the variance, c is the upper limit, γ is the Q function update rate;

3) Calculating and updating parameter theta of first Critic network ₁ And parameter θ of the second Critic network ₂ The method comprises the steps of carrying out a first treatment on the surface of the The corresponding updating method is as follows:

parameters of the first Critic network

Parameters of the second Critic network

4) Updating the parameter phi of the Actor network, the parameter phi ' of the Target Actor network and the parameter theta ' of the first Target Critic network at a preset step length interval d ' ₁ And parameter θ 'of the second Target Critic network' ₂ The method comprises the steps of carrying out a first treatment on the surface of the The corresponding parameter updating method is as follows:

parameters of an Actor network

The parameter phi ' ≡τ+ (1- τ) phi ' of the Target Actor network '

Parameter θ 'of Target Critic1 network' ₁ ←τθ ₁ +(1-τ)θ′ ₁

Parameter θ 'of Target Critic2 network' ₂ ←τθ ₂ +(1-τ)θ′ ₂

wherein ,

η is the Actor network update rate and τ is the Target network update rate;

5) Repeating the steps until the neural network converges.

In order to make a task offloading decision, the Actor network needs to train for many times according to the steps and update and train the first Critic network, the second Critic network, the Target Actor network, the first Target Critic network and the second Target Critic network by combining experiences in the cache; and finally, determining a resource scheduling strategy by adopting the Target Actor network, the first Target Critic network and the second Target Critic network which are obtained through training.

According to the method, the problem of resource scheduling among edge servers under satellite network edge computing is considered, the construction of a time delay sensitive satellite elastic Internet architecture is completed based on SDN/NFV technology and TSN security protocol, a corresponding time delay optimization model is obtained, a method for obtaining a resource scheduling strategy is solved by adopting the time delay optimization algorithm constructed based on the deep reinforcement learning algorithm architecture, comprehensive and effective optimization problem modeling is conducted based on the actual application scene of limited resources and task queuing time delay influence, computing resources and storage resources of satellites can be fully utilized, performance influence caused by queuing time delay is avoided, division of task priority can be achieved, diversified requirements of a user side can be met, service quality can be improved, calculation efficiency of massive service resource scheduling distribution can be improved based on the time delay optimization algorithm constructed in a targeted mode, further intelligent efficient satellite elastic Internet resource scheduling with priority service, time delay performance is improved, load balancing is guaranteed, application defects of the existing satellite resource scheduling scheme are effectively overcome, and the method has strong generalization capability and high practical value.

In order to verify the application effect and performance of the DDRA scheme, the application also implements a relevant simulation comparison experiment, and the specific simulation process is as follows:

the simulation platform adopts Python 3.9, is provided with 3 LEO satellites with the height of 784km to fly over a square area of 1200m multiplied by 1200m, 24 (modifiable) user terminals are distributed randomly on the ground by default, and each user terminal can only offload tasks to an MEC server of one LEO satellite or locally. Because the altitude is far greater than the ground area, the distance between each user side and the MEC server is approximately considered to be the altitude of the LEO satellite, and because the considered LEO satellite forms a star system, the loss of channel switching and the influence caused by the communication window are negligible, namely, the user time can be considered to be communicated with the LEO satellite, and the channel gain can be acquired in advance through a sensing technology. In addition, the transmission power of the user terminal is set to be 23dBm, the channel bandwidth is 20MHz, and the channel model is selected as a free space fading channel model. For task parameters, mainly considering the calculated amount, data amount, priority and possible energy consumption; task offloading decisions of the last time slot need to be considered during calculation, and the amount of calculation resources which can be provided by each LEO satellite in each time slot is assumed to be a random value in a range; in the DDTO algorithm, through careful adjustment, all the neural networks are divided into 4 layers, namely 1 input layer, 2 hidden layers and 1 output layer, wherein hidden layer neurons of an actor network are respectively 2048 and 1024, and hidden layer neurons of a critic network are respectively 1024 and 512; the training parameters of the model were learned at a rate of 0.001 and discounted by a factor of 0.75. In addition, the rest of the parameter settings are detailed in FIG. 5;

Simulation a: based on the continuity of the action space, selecting a SAC algorithm with similar performance to the DDRA algorithm, and analyzing and comparing convergence rates of the DDRA algorithm and other reinforcement learning algorithms; the SAC algorithm is also improved based on the DQN algorithm, the method also has the characteristic of over-estimation prevention on the calculation of the Critic network, and the difference is that SAC introduces strategy entropy into a reward function, namely encourages Agent to increase exploration while maximizing rewards, the design intention is to enable Agent to search global optimal solution rather than local optimal solution as much as possible, further more time is occupied for training, and the training process is not efficient;

simulation B: analyzing and comparing the optimized time delay results of the DDRA algorithm and other resource scheduling algorithms (SAC algorithm and local optimization algorithm are selected); the model based on the local optimization algorithm (marked as LOA) does not consider the influence of priority and queuing delay, can prove that the optimization problem is a convex optimization problem, and solves the Lagrangian equation of the problem through the KKT condition to obtain an optimal solution; meanwhile, in the LOA, the optimal solution is substituted into the objective function in the minimum time delay optimization model to obtain the average time delay;

Simulation C: based on the average value of the data quantity of 3.5Mb, the calculated quantity is respectively subjected to scenes of uniform distribution and normal distribution, and the influence of the calculated quantity of the task on the average time delay performance under the distribution of the calculation resources is researched;

simulation D: based on the mean value of the calculated amount being 5.5Gcycle, the data amount is respectively subjected to scenes of uniform distribution and normal distribution, and the influence of the task data amount on the average time delay performance under the allocation of the calculation resources is researched;

simulation E: based on the situation that the average value of the task calculated amount and the task data amount is 5Gcycle and 1Mb respectively and both are subjected to normal distribution, the influence of the user quantity on the average time delay performance under the computing resource distribution is researched.

Based on the simulation experiment, the following results are obtained:

the analysis results shown in FIGS. 6-7 were obtained from simulation A experiments: FIG. 6 compares the effect of discount rate on DDRA algorithm convergence; in deep reinforcement learning, the discount factor is an important hyper-parameter in the Markov decision process. By setting the discount factors to 0.05, 0.75 and 0.95 in sequence, the discount factors are found to have bad influence when being too large or too small, and when the discount factors are 0.05 and 0.95, the average time delay is relatively large fluctuated, which indicates that the system is unstable under 2 conditions, and particularly when the discount factor is 0.05, a relatively large peak value appears at the epicode=50 or so; when the discount factor is 0.75, the convergence of the system is stable and the convergence speed is moderate, so that the system is suitable for being used as a default super-parameter of a model when the task calculation amount and the task data amount are analyzed next. Based on this, it can be seen that the discount factor must be reasonably selected, otherwise it will cause instability or convergence of the system to an unsuitable value; fig. 7 compares the convergence of different reinforcement learning algorithms under SMTOM, and finds that DDRA has converged to about 1.8s when epicode=100, while SAC algorithm converges to about 2s after epicode=5000; as can be seen, the convergence speed of the DDRA algorithm is significantly faster than that of the SAC algorithm, and the final average delay obtained by convergence is also lower than that of the SAC algorithm, i.e. the DDRA has higher calculation efficiency and better optimization result in the second scenario under the SMTOM, which is very important in the satellite communication scenario with scarce resources.

Based on simulation B and simulation C, the analysis results shown in fig. 8 to 9 were obtained: fig. 8-9 can summarize that as the task calculation increases, the average system delay tends to increase, and the SAC algorithm tends to increase less stably. Meanwhile, the DDRA and the local optimization algorithm LOA provided by the application have better performance, and the average time delay is lower than that of the SAC algorithm. In addition, when the calculated amount is smaller than 4.5Gcycle, the performance of the DDRA algorithm is poorer than that of LOA, but when the calculated amount is larger than 4.5Gcycle, the performance of the DDRA algorithm is better than that of LOA, and as the LOA does not consider the queuing delay problem, when the calculated amount of tasks is larger, the processing time of each task is increased, the queuing delay is increased, and if the processing time is still only partially optimized, the effect of the reduced time delay performance is inevitably brought; based on the method, the advantages of the DDRA algorithm in the application scene with large calculation amount are reflected, and the advantages of SMTMO in the satellite communication scene are reflected.

When the data amount obeys the uniform distribution in fig. 8, the average delay of SAC algorithm is 37.8% higher and the average delay of DDRA algorithm is 2.7% lower than that of LOA algorithm. In fig. 9, it can be seen that the average time delay performance of each algorithm is not greatly different when the data volume is subjected to normal distribution and the average time delay of each algorithm is relatively stable when the calculated volume is relatively low, and the time delay performance of the method is superior to DDRA and SAC; after the calculated amount is increased, a result with larger variance appears in the SAC algorithm, which possibly fails to converge to the optimal solution under the limit of training times, and the DDRA algorithm exceeds LOA when the delay performance is represented, so that the superiority of the DDRA algorithm under the high task calculated amount under the SMTMO is reflected.

Based on simulation B and simulation D, the analysis results shown in FIGS. 10-11 were obtained: from fig. 10 to fig. 11, it can be summarized that the task data volume has little influence on the task unloading performance, and as the data volume increases, the average time delay of various algorithms generally shows a smaller increase trend, because the data volume only affects the transmission time delay, and the transmission time delay is usually smaller in the whole time delay, and under the LOA, the optimal value obtained through gradient solution is not related to the task data volume. Of the three algorithms, the SAC algorithm has larger jitter, and the LOA and DDRA show a smooth rising trend compared with each other. In addition, when the calculated amount obeys the uniform distribution in fig. 10, compared with the LOA, the average time delay of the SAC algorithm is 1.9% lower, and the average time delay of the random algorithm is 11.5% lower; it can be seen from fig. 11 that when the data volume obeys the normal distribution, the average delay of the SAC algorithm is 2.1% higher and the average delay of the random algorithm is 11.2% lower than the LOA; therefore, under higher task calculation amount, the average time delay performance of the SAC algorithm is similar to that of the LOA, and the DDRA provided by the invention is about 11% better than that of the LOA.

Based on simulation B and simulation E, the analysis results shown in fig. 12 were obtained: it can be seen from fig. 12 that, in general, the average time delay under each algorithm gradually increases with the number of users, and in particular, the local optimization algorithm, and the increasing trend of the average time delay increases somewhat like an exponential increase. Comparing the three algorithms, on average, the average time delay performance of the DDRA algorithm is best, and the average time delay under different user numbers in the simulation is 6.5s; the SAC algorithm performs a little worse than the DDRA algorithm, and the average time delay under different user numbers in the simulation is 6.8s; the average time delay of the local optimization algorithm under different user numbers in the simulation is 13.1s, because the influence caused by queuing time delay is ignored, the influence is not great when the user number is small, but if the user number is increased, the average time delay is also increased sharply, so that the time delay performance under the multi-user task number is greatly reduced, and the user experience is reduced. That is, overall, the DDRA algorithm has an average latency of 4.4% lower than the SAC algorithm and 50.4% lower than the local optimization algorithm for different numbers of clients.

Based on the results of the simulation experiments, the invention establishes a satellite elastic Internet architecture and a satellite elastic Internet resource scheduling model aiming at the resource scheduling problem in the satellite elastic Internet scene, and on the basis, proposes an average time delay target under the energy limiting condition, and further proposes a technical scheme for solving the non-convex optimization problem by utilizing a reinforcement learning algorithm TD3 framework, so that the average time delay of user end task unloading can be obviously reduced, and real effective satellite resource scheduling is realized.

Although the steps in the flowcharts described above are shown in order as indicated by arrows, these steps are not necessarily executed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders.

In one embodiment, as shown in fig. 13, there is provided a satellite-resilient internet resource scheduling system, the system comprising:

the architecture construction module 1 is used for establishing a time delay sensitive satellite elastic Internet architecture based on the LEO satellite and the many-to-many mode of the user side; the LEO satellite corresponds to one MEC server;

The first modeling module 2 is used for establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture;

the second modeling module 3 is used for establishing a minimum time delay optimization model according to the satellite elastic internet resource scheduling model;

the model conversion module 4 is used for converting the minimized time delay optimization model into a corresponding Markov decision model;

and the strategy solving module 5 is used for solving the Markov decision model to obtain a resource scheduling strategy.

For a specific limitation of a satellite-resilient internet resource scheduling system, reference may be made to the limitation of a satellite-resilient internet resource scheduling method hereinabove, and no further description is given here. The modules in the satellite elastic internet resource scheduling system can be all or partially realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Fig. 14 shows an internal structural diagram of a computer device, which may be a terminal or a server in particular, in one embodiment. As shown in fig. 14, the computer device includes a processor, a memory, a network interface, a display, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a satellite-resilient internet resource scheduling method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those of ordinary skill in the art that the architecture shown in fig. 14 is merely a block diagram of some of the architecture relevant to the present application and is not intended to limit the computer device on which the present application may be implemented, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have the same arrangement of components.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when the computer program is executed.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the above method.

In summary, the satellite elastic internet resource scheduling method and system provided by the embodiment of the invention realize the establishment of a time delay sensitive satellite elastic internet architecture based on multiple pairs of multimode of LEO satellites and user terminals, and after the establishment of a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture, a minimum time delay optimizing model is established according to the satellite elastic internet resource scheduling model, and the minimum time delay optimizing model is converted into a corresponding Markov decision model, and the Markov decision model is solved, so that the technical scheme of a resource scheduling strategy is obtained.

In this specification, each embodiment is described in a progressive manner, and all the embodiments are directly the same or similar parts referring to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. It should be noted that, any combination of the technical features of the foregoing embodiments may be used, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few preferred embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the invention. It should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and substitutions should also be considered to be within the scope of the present application. Therefore, the protection scope of the patent application is subject to the protection scope of the claims.

Claims

1. The satellite elastic internet resource scheduling method is characterized by comprising the following steps of:

establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture; the satellite elastic Internet resource scheduling model comprises a communication model, a task model and a calculation model; the computing model comprises a local off-load task computing model and an MEC off-load task computing model;

solving a Markov decision model to obtain a resource scheduling strategy;

the step of establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture comprises the following steps:

in the formula ,

wherein ,

and />

Respectively representing a user side set and an MEC server set; />

Q _i (t)＝{ω _i (t),z _i (t),pri _i (t)}

wherein ,Q_i (t)Indicating the user end u in the time slot t _i A generated task; omega _i (t) represents task Q _i (t) the amount of calculation required; z _i (t) represents task Q _i The data size of (t); pri (pri) _i (t) represents the task Q _i (t) priority, and pri _i (t)∈[1,2,…,PN]PN represents the number of priorities;

dividing each user end task into a local offloading task and an MEC offloading task, and respectively constructing a corresponding local offloading task calculation model and an MEC offloading task calculation model; the local offload task computation model is expressed as:

wherein ,

and />

the MEC off-load task calculation model is expressed as:

wherein ,

Representing MEC server b _j Is a CPU frequency of (2); />

Indicating priority as pri _i MEC offload task Q _i ^E Is used for the average queuing delay.

2. The method for scheduling satellite elastic internet resources according to claim 1, wherein the step of establishing a delay-sensitive satellite elastic internet architecture based on the LEO satellite and the many-to-many mode of the user terminal comprises:

3. The method for scheduling satellite elastic internet resources according to claim 1, wherein the step of establishing a minimum delay optimization model according to the satellite elastic internet resource scheduling model comprises:

calculating the task processing average time delay of each time interval in a preset time range according to the satellite elastic Internet resource scheduling model;

averaging the task processing average time delays of all the time slots to obtain the task processing average time delay of a preset time range;

in the formula ,

and />

/>

Indicating the user end u in the time slot t _i Offloading tasks to MEC server b _j Is used for the transmission energy consumption of the (a); t represents the total number of time slots; e (E) _i Representing user end u _i An upper transmission energy consumption limit of (2); />

Representing MEC server b _j At the time ofThe gap t is allocated to the user terminal u _i Task Q of (2) _i Is a ratio of the calculated resources.

4. The satellite elastic internet resource scheduling method of claim 3, wherein the step of converting the minimized delay optimization model into a corresponding markov decision model comprises:

representing an unloading strategy of a user side;

a(t)＝κ(t)

where r (t) represents the Agent action rewards for time slot t.

5. The method for scheduling satellite elastic internet resources according to claim 1, wherein the step of solving a markov decision model to obtain a resource scheduling policy comprises:

6. A satellite-resilient internet resource scheduling system, the system comprising:

the first modeling module is used for establishing a satellite elastic Internet resource scheduling model according to the time delay sensitive satellite elastic Internet architecture; the satellite elastic Internet resource scheduling model comprises a communication model, a task model and a calculation model; the computing model comprises a local off-load task computing model and an MEC off-load task computing model;

the strategy solving module is used for solving the Markov decision model to obtain a resource scheduling strategy;

the establishing a satellite elastic internet resource scheduling model according to the time delay sensitive satellite elastic internet architecture comprises the following steps:

in the formula ,

wherein ,

and />

Respectively representing a user side set and an MEC server set; />

Q _i (t)＝{ω _i (t),z _i (t),pri _i (t)}

wherein ,Q_i (t) represents the user terminal u in the time slot t _i A generated task; omega _i (t) represents task Q _i (t) the amount of calculation required; z _i (t) represents task Q _i The data size of (t); pri (pri) _i (t) represents the task Q _i (t) priority, and pri _i (t)∈[1,2,…,PN]PN represents the number of priorities;

wherein ,

and />

the MEC off-load task calculation model is expressed as:

wherein ,

Representing MEC server b _j Is a CPU frequency of (2); />

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.