CN112422171A

CN112422171A - Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network

Info

Publication number: CN112422171A
Application number: CN202011251365.3A
Authority: CN
Inventors: 周笛; 王怡昕; 盛敏; 李建东; 吴家鑫; 戴诺伊; 王晨光; 白卫岗
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2021-02-26
Anticipated expiration: 2040-11-09
Also published as: CN112422171B

Abstract

The invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the problem of optimization of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing an uncertain environment remote sensing satellite network model; generating environmental parameter data; initializing required parameters; directing satellite power allocation; pre-transferring the network state; guiding satellite power pre-allocation; updating and judging network parameters; updating the network state, the action and the current time slot number; and obtaining the network parameters to provide guidance for multi-dimensional resource scheduling. The invention obtains the network environment parameter data under certain scale and parameters through software simulation, defines a six-dimensional characteristic function, and linearly approximates the action cost function by combining a weight vector. The invention solves the problem of continuous network state space, avoids over-estimation of parameter updating, adapts to the remote sensing satellite network in the dynamic random variation environment in the future and provides guidance for network gauge and network optimization.

Description

Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network

Technical Field

The invention relates to the technical field of satellite communication, mainly relates to remote sensing satellite network resource joint scheduling, in particular to an intelligent dynamic resource joint scheduling method under an environment-indeterminate remote sensing satellite network, and can be used for the remote sensing satellite network in a time-varying and unpredictable environment.

Background

Compared with a land communication network, the satellite network has the advantages of long communication distance, high communication quality, no geographic condition limitation on communication service, no influence of natural disasters and the like. In recent years, with the rapid increase of the demand of people on high-timeliness, high-precision and high-utility remote sensing data, the country continuously increases the investment and construction of satellite remote sensing services, and the remote sensing satellite network increasingly shows the value of social and economic development, close connection with the national economic development strategy and huge development potential. At present, the method is widely applied to various aspects such as agricultural assessment, ecological environment monitoring, weather forecasting, disaster prevention and reduction and the like, and is closely related to our lives. The remote sensing satellite network consists of a remote sensing satellite, a relay satellite and a ground station. The remote sensing data is acquired by a remote sensing satellite and is directly or indirectly transmitted to the ground station with the assistance of a relay satellite. The resource is the basis of remote sensing satellite network operation, and the resource scheduling directly influences the transmission performance, so the method has great significance for researching the multidimensional resource joint scheduling method.

Due to the orbital motion characteristic of the satellite, the remote sensing satellite periodically appears on the sun and the shade of the earth. In the sun of the earth, solar energy is supplied, and the supply amount is random and unpredictable due to the influence of solar energy sail loss, ion radiation and the like. When the earth is cloudy, no solar energy is supplied, and energy can be supplied only through a satellite-borne battery. Meanwhile, the remote sensing satellite has conventional static energy consumption and dynamic energy consumption (including task acquisition and transmission) guided by resource scheduling in order to maintain the normal operation of the system and execute tasks during the normal working period of the remote sensing satellite. In addition, due to different orbital characteristics, the topological structure of the remote sensing satellite network is another time-varying factor, which seriously affects the transmission of tasks. In conclusion, how to design an efficient multidimensional resource joint scheduling method to optimize the long-term performance of a network is an important problem to be researched urgently.

There are many researches on resource scheduling methods, which can be mainly classified into static and dynamic types. The static algorithm requires that all unknown environmental data including solar energy arrival, channel conditions and the like within the service time are obtained before task transmission, and then the resource scheduling problem is solved based on global planning. Dynamic algorithms do not require predictive data and can be further divided into two categories depending on whether statistical features are relied upon. The first one includes BIBM algorithm that is published by authors such as D.Zhou on IEEE Wireless Communications Letters and proposed in "Session QoS and software Service Lifetime handoff in Remote Sensing software Networks", and the algorithm combines three aspects of task acquisition, processing and sending, and solves the problem of resource scheduling based on a state transition probability model. The second includes the Approximate SARSA algorithm, which is mentioned in "information learning for energy transforming point-to-point Communications" published by authors of A.Ortiz et al in 2016IEEE International Conference on Communications, Kuala lumuru, and gets the optimal resource scheduling policy only through causal data without the limitation of knowing statistical characteristics.

The two methods mentioned above basically cover the current research situation of the resource scheduling method. The static algorithm is too ideal, and the non-causality of the static algorithm cannot be applied to a remote sensing satellite network. The first dynamic algorithm is equally inapplicable since the remote sensing satellite network does not have fixed, deterministic statistical features. The second dynamic algorithm can be used for solving the resource scheduling problem of the remote sensing satellite network, but the current research mostly focuses on a common energy collection system, and does not consider the task flow of the remote sensing satellite network and the particularity of the environment where the remote sensing satellite network is located.

Disclosure of Invention

The invention aims to design an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where an environment uncertain remote sensing satellite network is located and the resource scheduling scene thereof, aiming at the defects and limitations of the prior art.

The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving the planning problem with high complexity and the continuous and infinite state space are avoided through reinforcement learning, and the method comprises the following steps:

(1) establishing a remote sensing satellite network model with uncertain environment: establishing a remote sensing satellite network model with uncertain environment, firstly determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station, and then defining a state set S, an action set A, a reward R and an action value function of the remote sensing satellite network

The state set S ═ { B × D × H × E^HAt the ith time slot starting time, remotely sensing the state S of the satellite network_iIncluding the current charge B of the battery_iData buffer existing data volume D_iChannel parameter H_iAnd absorption of solar energy

Four parts; according to ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, establishing dynamic channel model of satellite-to-ground and inter-satellite link, and obtaining channel parameter H by simulation_i(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energy

The action set A ═ { A ═ A_r×A_tIncludes received power { A }_rAnd transmit power { A }_tTwo parts, which can be respectively expressed as

And

where δ represents the step size, 0 represents no data being received or transmitted, P^MAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,

if not, then,

the reward R is expressed by the data volume sent by the satellite at the initial time of the time slot; the action cost function

The meaning of (1) is that the agent is guided by a strategy pi in a state S_iNext, action P is performed_iLater, an expectation of return is obtained; completing the establishment of a remote sensing satellite network model with uncertain environment;

(2) generating data of the environmental parameters: deriving original data of environmental parameters in a topological period through an STK software simulation remote sensing satellite network model, and processing the original data through MATLAB software to obtain link on-off, link connection duration, remote sensing satellite position and duration of each time slot in the sun, wherein the data are used as environmental parameter data of an intelligent resource joint scheduling method;

(3) initializing intelligent resource federationParameters required by the scheduling method are as follows: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity B_maxBattery capacity threshold B_minData memory capacity D_maxStatic power consumption P_consLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omega_criticActor network parameter ω_actorUpdate interval T of learning rate alpha, Critic network parameters_copyUpdate interval T of Actor network parameters_trainTraining total time slot number I, current time slot number I and discount factor gamma;

(4) and guiding the satellite to perform power distribution: observing the state S_iExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action_i(S_i,P_i) Calculating and combining with an Actor network parameter omega_actorSelecting an action P in the set of feasible actions using an epsilon-greedy strategy_iAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;

(5) pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environment_iAnd judging whether iteration is finished: if the I is I, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed;

(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'_iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'_i(S′_i,P′_i) In combination with an Actor network parameter omega_actorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'_iAs a power allocation scheme for next slot preselection, and samples (f)_i,P_i,R_i,f′_i,P′_i) Putting the network parameter into an experience memory for subsequent network parameter updating;

(7) critic network parameter omega_criticUpdating and judging: for current time slot number i and Critic networkUpdate interval T of parameters_copyPerforming a remainder operation to determine whether the remainder operation result satisfies i% T_copyIf 0, then according to ω_critic＝ω_actorTo update the critical network parameter omega_criticCarrying out the next step, otherwise, directly carrying out the next step;

(8) actor network parameter ω_actorUpdating and judging: updating interval T for current time slot number i and Actor network parameter_trainPerforming a remainder operation to determine whether the remainder operation result satisfies i% T_trainIf yes, updating the Actor network parameter omega according to a gradient descent strategy_actorCarrying out the next step, otherwise, directly carrying out the next step;

(9) updating the state, the action and the current time slot number of the remote sensing satellite network: s_i+1＝S′_i，P_i+1＝P′_iI is i +1, completing one iteration, and then turning to the step (5);

(10) obtaining a network parameter omega for guiding joint scheduling_critic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network_criticThe intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished; in practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0).

Aiming at the remote sensing satellite network, on one hand, the invention comprehensively considers the task transmission flow of the remote sensing satellite network and the environmental characteristics of the remote sensing satellite network, and establishes a relatively comprehensive remote sensing satellite network model. On the other hand, in the design stage, the optimization problem is solved on the basis of reinforcement learning, the complexity is reduced, the problem of continuous state space of the remote sensing satellite network is solved by defining a characteristic function and a weight vector and utilizing a linear approximation mode, the optimal resource scheduling scheme is searched on the basis of accurately characterizing the state, and the accuracy of the result is improved.

The invention helps the remote sensing satellite balance the battery resource and data transmission in dynamic environment, ensures the remote sensing satellite network to efficiently transmit tasks, and improves the transmission performance.

Compared with the prior art, the invention has the following advantages:

the characteristics of satellite operation are reflected in resource scheduling: the invention fully considers the satellite attribute and the working characteristics in the design of the resource joint scheduling method, such as the static energy consumption required by various systems such as satellite-borne thermal control, satellite affair and the like to maintain the normal operation of the remote sensing satellite. The multi-dimensional resources of the satellite are considered jointly, the resource scheduling characteristics of acquisition and transmission of the joint tasks of the remote sensing satellite are reflected, and the optimal resource scheduling scheme is determined based on the resource scheduling characteristics.

The scheduling method is closer to a remote sensing satellite network: according to the invention, a remote sensing satellite network model is built, environmental data and position data of the remote sensing satellite network running in a topological period with determined parameters and scale are obtained through STK software simulation, and MATLAB is used for processing the original data to obtain time-slotted environmental and position parameters, so that the scene of the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is more practical.

Solving a complex constraint planning problem based on reinforcement learning: the method utilizes the reinforcement learning idea to ensure that the solution of the multidimensional resource joint scheduling problem is independent of any non-causal data and statistical characteristics; aiming at the problem of continuous and infinite state space, the invention provides a six-dimensional characteristic vector reflecting the environmental characteristics of the remote sensing satellite network and the task transmission characteristics of the remote sensing satellite in the design of a resource joint scheduling method, maps the state and action pairs to the six-dimensional characteristic vector, and evaluates the quality of the action in a linear approximation mode by combining a weight vector, thereby solving the problem of infinite state space of the remote sensing satellite network and avoiding the storage problem caused by state discretization processing. In addition, the invention constructs two independent networks with the same structure and different parameters, thereby avoiding the problem of over-estimation during parameter updating to a certain extent.

Drawings

FIG. 1 is an overall flow diagram of an implementation of the present invention;

FIG. 2 is a schematic diagram of a remote sensing satellite network model in the present invention;

FIG. 3 is a sub-flow diagram of the present invention for selecting actions based on the ε -greedy policy;

FIG. 4 is a sub-flow diagram of the gradient descent strategy of the present invention;

FIG. 5 is a graph of cost values in the present invention;

FIG. 6 is a comparison graph of average periodic rewards in the present invention.

The invention is described in detail below with reference to the attached drawings and examples

Detailed Description

Example 1

The particularity and the dynamic variability of the remote sensing satellite network environment and the diversity of the energy consumption of the remote sensing satellite enable the remote sensing satellite network to be different from other communication networks. Research on remote sensing satellite network resource scheduling is numerous, and environmental data can be classified into static and dynamic according to the need of predicting whether the environmental data is needed or not. Static algorithms are based on known conditions, meaning that the environment data at all times in the future needs to be known before the satellite starts a transmission task. Although the static algorithm improves the upper bound of the performance of the remote sensing satellite network, the application of the static algorithm is limited due to non-causality of the static algorithm because the static algorithm is over-ideal, so that the application scene is few, and the static algorithm cannot meet most scenes in actual life. The dynamic algorithm is based on an unknown environment, which means that the remote sensing satellite does not need to give any environmental data in advance, and the dynamic algorithm can be further divided into two algorithms. The first category of dynamic methods refers to dynamic programming, which can be used to solve such problems when the statistical characteristics of the data, such as state transition probabilities, are known, based on a Markov decision process model. However, the operation complexity of the dynamic programming method is greatly increased along with the expansion of the problem scale, and serious calculation burden is brought to low-power equipment; meanwhile, not all processes have statistical characteristics, and the statistical characteristics may change with conditions and time, so that the method still has disadvantages. The second dynamic method is based on reinforcement learning and does not need series conditions such as environmental data or state transition models and the like as a premise, which means that the environmental data at each moment can be obtained only when the moment arrives, and the method is more suitable for the actual situation of the remote sensing satellite network. However, based on the research of the method, the data receiving process of the remote sensing satellite network is omitted, or the particularity of the environment where the remote sensing satellite network is located, such as the time-varying property of channel conditions and the characteristic that the remote sensing satellite operates under the alternate sunny and shady surfaces, is omitted. From the research situation, the characteristics above the remote sensing satellite network are not comprehensively considered in the current research, and the multi-dimensional resource joint scheduling problem of the remote sensing satellite network is yet to be further researched.

Aiming at the current situation, the invention designs an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where the remote sensing satellite network with uncertain environment is located and the resource scheduling scene thereof through research and experiments.

The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, wherein an established network model is suitable for the environment where the remote sensing satellite network is located and a resource scheduling scene of the remote sensing satellite network, and the problem of directly solving a high-complexity planning problem is avoided through reinforcement learning, and the method comprises the following steps of:

(1) establishing a remote sensing satellite network model with uncertain environment: a remote sensing satellite network model with uncertain environment is established, and the remote sensing satellite network mainly comprises a remote sensing satellite for acquiring and transmitting data and a ground station for receiving the data. Firstly, determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station. Then defining a state set S, an action set A, an award R and an action value function of the remote sensing satellite network

State set S, S ═ { B × D × H × E in the present invention^HAt the ith time slot starting time, remotely sensing the state S of the satellite network_iIncluding the current charge B of the battery_iData buffer existing data volume D_iChannel parameter H_iAnd absorption of solar energy

And fourthly, the method comprises the following steps. Wherein the battery capacity has an upper limit B_maxAnd a lower limit of B_minThe data buffer also has an upper bound D_maxSatellite-earthThe channel and the inter-satellite channel have different channel models, channel parameters have different calculation modes, absorbed solar energy can change along with the alternation of the shade and the sun of the remote sensing satellite, and obviously, when the remote sensing satellite is positioned on the shade, no solar energy is supplied. The invention establishes a dynamic channel model of a satellite-to-ground link and an inter-satellite link according to the standards of the recommendation of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839, and obtains a channel parameter H through simulation_i. The invention considers the orbit characteristic of satellite operation, establishes a dynamic energy collection model, and simulates to obtain the absorbed solar energy

Action set a ═ a_r×A_tIncludes received power { A }_rAnd transmit power { A }_tTwo parts, which can be respectively expressed as

And

if not, then,

the reward R is expressed in terms of the amount of data transmitted by the satellite at the initial time of the time slot. Action cost function in the invention

The meaning of (1) is that the agent is guided by a strategy pi in a state S_iNext, action P is performed_iLater, an expectation of return is obtained; and finishing the establishment of the remote sensing satellite network model with uncertain environment. In the step, the method considers the process of acquiring data by the remote sensing satellite, and embodies the resource scheduling characteristic of the joint planning of the acquired and transmitted data by the remote sensing satellite network. In addition, the bookThe method establishes two channel models of the uncertain environment in which the remote sensing satellite network is positioned, considers the characteristic of energy supply, and is more suitable for the remote sensing satellite network scene.

(2) Generating data of the environmental parameters: and (3) deriving original data of environmental parameters in a topological period by simulating a remote sensing satellite network model through STK software, wherein the original data comprises the initial time, the termination time and the duration of links established between the remote sensing satellite and all ground stations and relay satellites, longitude, latitude and height information of the remote sensing satellite and the duration of the remote sensing satellite on the sun. And processing the original data through MATLAB software, carrying out time slot processing on the data again, and obtaining the duration of each time slot of the remote sensing satellite in the sun, the link on-off state and the link duration again by taking tau as the unit time slot length, wherein the data is used as environmental parameter data of the intelligent resource joint scheduling method. In the step, the remote sensing satellite network operation scene constructed by the STK software simulation is utilized, the original environment parameter data is obtained, and the MATLAB software is combined to process the original environment parameter data, so that the simulation result of the method provided by the invention is more accurate.

(3) Initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity B_maxBattery capacity threshold B_minData memory capacity D_maxStatic power consumption P_consLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omega_criticActor network parameter ω_actorUpdate interval T of learning rate alpha, Critic network parameters_copyUpdate interval T of Actor network parameters_trainTraining total time slot number I, current time slot number I and discount factor gamma.

(4) And guiding the satellite to perform power distribution: observing the state S_iExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action_i(S_i,P_i) Calculating and combining with an Actor network parameter omega_actorSelecting an action P in the set of feasible actions using an epsilon-greedy strategy_iAnd as a power allocation scheme of the current time slot, the satellite is guided to perform power allocation. In the step, the six-dimensional characteristic function reflects the characteristic of joint scheduling of the multi-dimensional resources of the remote sensing satellite network from six angles respectively, is used for extracting characteristic vectors of state and action pairs and is used for approximating a cost function. The use of the epsilon-greedy strategy avoids the resource scheduling method from falling into a locally optimal situation.

(5) Pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environment_iAnd judging whether iteration is finished: if so, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed. Since the method provided by the invention can be converged after a certain number of iterations, it is specified that if the current time slot number I is equal to the training total time slot number I, the intelligent resource joint scheduling method is ended.

(6) And guiding the satellite to perform power pre-allocation: observation of Pre-State S'_iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'_i(S′_i,P′_i) In combination with an Actor network parameter omega_actorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'_iAs a power allocation scheme for next slot preselection, and samples (f)_i,P_i,R_i,f′_i,P′_i) And putting the network parameter into an experience memory for subsequent network parameter updating. The power pre-allocation in this step refers to allocating power when the current time slot i is the starting time of the next time slot, and is still within the time slot i, so the power is called pre-allocation, and the mark 'represents a pre-variable f'_i,P′_i,S′_i. Considering the limited capacity of the empirical memory, when the number of samples reaches the upper limit of the capacity, the storage of the subsequent samples is always performed according to the rule that the new samples replace the old samples.

(7) Critic network parameter omega_criticUpdating and judging: update interval T for current time slot number i and Critic network parameters_copyPerforming a remainder operation to determine whether the remainder operation result satisfies i% T_copyIf 0, then according to ω_critic＝ω_actorTo update the critical network parameter omega_criticAnd carrying out the next step, otherwise, directly carrying out the next step. Critic network parameter omega_criticAnd participating in the calculation of the approximate target action value, wherein the parameter is also an output parameter of the method and is used for guiding the joint scheduling of the multidimensional resource. Wherein, Critic network parameter omega_criticIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omega_criticAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground link_criticω of inter-satellite links_criticWith a different number of rows.

(8) Actor network parameter ω_actorUpdating and judging: updating interval T for current time slot number i and Actor network parameter_trainPerforming a remainder operation to determine whether the remainder operation result satisfies i% T_trainIf yes, updating the Actor network parameter omega according to a gradient descent strategy_actorAnd carrying out the next step, otherwise, directly carrying out the next step. Actor network parameter ω_actorParticipate in the selection of the power allocation scheme. Wherein, the Actor network parameter ω_actorIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omega_actorAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground link_actorω of inter-satellite links_actorWith a different number of rows.

(9) Updating the state, the action and the current iteration number of the remote sensing satellite network: s_i+1＝S′_i，P_i+1＝P′_iCompleting one iteration, and then turning to the step (5) to perform remote sensing satellite network state pre-transition; in step (5), the result is output or a new iteration is started by judging, or finishing the iteration. In this step, the new state S_i+1And new actionsP_i+1Namely the pre-state S 'in the steps (5) and (6)'_iAnd Pre-motion P'_i。

(10) Obtaining a network parameter omega for guiding joint scheduling_critic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network_criticAnd the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished. In practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0). In particular, at the start of each time slot during the operation of the remote sensing satellite, according to the current state S_iExtracting the status and action pair (S) formed by the action and each action in the feasible action set_i,P_i) In combination with ω_criticCalculating the approximate action value, and selecting the action P corresponding to the maximum approximate action value_iAs the best action, the action is executed, the next time slot is shifted, and the steps are repeated.

Aiming at the defects that the prior art does not comprehensively consider the remote sensing satellite network scene and the task transmission flow of the remote sensing satellite, the working characteristics and the environment characteristics of the remote sensing satellite are comprehensively considered, time-varying environment data and link information are obtained through a simulation network, reinforcement learning is used as an algorithm frame, a Markov decision process is used as a basic model, and the ideas of linear approximation and double networks are added, so that the whole technical scheme for realizing the intelligent resource joint scheduling of the remote sensing satellite under the remote sensing satellite network with uncertain environment is provided, and the problem of effectively utilizing the multidimensional resources of the remote sensing satellite in a joint manner to optimize the transmission performance of the remote sensing satellite network is solved. The idea of the invention is as follows: firstly, establishing a remote sensing satellite network model and defining a task transmission flow of a remote sensing satellite; secondly, the maximum network transmission data volume is taken as an objective function, a constraint function is listed according to the environment and the satellite attribute, and the resource joint scheduling problem is modeled as an optimization problem, but the optimization problem cannot be directly solved because non-causal data cannot be obtained in a remote sensing satellite network; and then, an intelligent resource joint scheduling method is provided, aiming at guiding the remote sensing satellite to realize the optimal intelligent resource joint scheduling in continuous learning under the control of the method only according to causal data. The invention leads the remote sensing satellite to continuously accumulate experience in trial and error from zero experience in a learning period and continuously update parameters based on values until convergence. And the convergence parameters output by the method are used as the basis of final power distribution to realize the multidimensional resource joint scheduling. In the early stage, the invention simulates a network to obtain the link connection and energy arrival conditions in a topological period as the environmental parameters of the method. In the method, at the starting moment of each time slot, the remote sensing satellite carries out power distribution according to an epsilon-greedy strategy. Specifically, the remote sensing satellite randomly selects a power allocation scheme with a probability of ε, and selects a power allocation scheme that maximizes the cost function of action with a probability of 1- ε. The remote sensing satellite state value is infinite, so the invention introduces the concept of weight vector, and the inner product is obtained by the weight vector and the feature vector obtained by feature extraction, and the inner product is linearly approximate to the action cost function. And determining and executing a power distribution scheme, wherein the remote sensing satellite can obtain a feedback reward to be used as the evaluation of the power distribution, and a state pre-transfer process is carried out. And selecting a power allocation scheme according to an epsilon-greedy strategy, executing the power allocation scheme at the starting moment of the next time slot, and continuously repeating the process in the advancing of the time slot, wherein each process can be regarded as one time slot iteration. And saving part of parameters in each time slot to an experience memory as a sample of a subsequent network parameter updating process. And updating the Actor network parameters at regular intervals according to a gradient descent strategy, wherein the updating of the parameters is to adjust the weight vector to update the weight vector towards the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is reduced. The criticic network parameters are copied from the Actor network, and updating is completed. In order to ensure the convergence of the method, both the exploration rate epsilon and the learning rate alpha are reduced along with the increase of the iteration number. The method can be understood that, as the number of iterations increases, the effectiveness of the multidimensional resource joint scheduling strategy under the guidance of the weight vector is higher and higher, and the promotion space is smaller and smaller. Simulation results show that the method can be converged after a certain number of iterations, and the performance is superior to that of other comparison methods under the same condition.

Example 2

The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1, and in order to avoid the resource scheduling method from falling into local optimization, an epsilon-greedy strategy is adopted in a learning stage. In the early stage of the learning stage, the remote sensing satellite is more inclined to explore, namely, a resource scheduling scheme which is not tried is adopted; in the later stage of the learning stage, the remote sensing satellite is more greedy, namely, the best remote sensing satellite is selected as a resource scheduling scheme from the existing experience.

The method for guiding the satellite to carry out power distribution in the step (4) selects an action P in the feasible action set by using an epsilon-greedy strategy_iAs a combination of a pair of receiving and transmitting power, the remote sensing satellite receives and transmits corresponding data at the cost of energy consumption according to the power distribution scheme under the current environmental condition, and the method specifically comprises the following steps:

(4a) computing a set of feasible actions { A_f}_i: because the remote sensing satellite is constrained by the satellite-borne battery energy and the data buffer capacity, the remote sensing satellite must perform power distribution on the premise of meeting the resource constraint. Capacity threshold B_minThe remote sensing satellite system has the advantage that the service life of the remote sensing satellite system is prevented from being exhausted due to too low battery energy of the remote sensing satellite. The invention thus provides that the battery capacity B of the remote sensing satellite is used_iLower than B_min+P_consAnd when the x tau is multiplied, the remote sensing satellite does not transmit or receive any data. From the state S of the remote sensing satellite network_iComputing a set of satisfied feasible actions { A_f}_iAll actions P of a condition_iIncluding the received power P_irAnd a transmission power P_itTwo parts, the resource constraint relationship is specifically as follows:

wherein, tau_ihIndicating the duration of the link at the start of the ith time slot, C_it(P_it,H_i) Indicating the start time of the ith time slot based on the current transmission power P_itAnd channel parameter H_iLower chainThe path transmission rate can be calculated as follows:

here, B_c(Hz) represents the channel bandwidth.

(4b) Computing a six-dimensional feature vector f_i(S_i,P_i): for each set of state and action pairs (S)_i,P_i) Computing the feature vector f by six feature functions_i(S_i,P_i) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1_iPerformed action P_iIs a function of state and action.

(4c) Selecting action P according to epsilon-greedy policy_i: according to the epsilon-greedy strategy, the epsilon-greedy strategy means that the remote sensing satellite has a feasible action set { A }_f}_iIn which either the action with the highest value of the approximate action is chosen with a probability of 1-epsilon or the action is chosen randomly with a probability of epsilon, only one result being the action P chosen according to the epsilon-greedy strategy_iExpressed as follows:

wherein,

indicates that the ith time slot is based on a state and action pair (S)_i,P_i) And Actor network parameter ω_actorThe approximate cost function of the motion of (c),

here, the state space of the remote sensing satellite network is continuous and infinite by the definition of the state set S, and even if the state is discretized, the size of the state table is enormous in order to reduce distortion as much as possible, which makes storage difficult. Therefore, the invention adopts a linear approximation mode directlyPlanning is performed on successive states.

Example 3

The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as that in the embodiment 1 and the step (4b) of calculating the six-dimensional feature vector f_i(S_i,P_i) Specifically, the following six-dimensional considerations are considered.

(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. In a resource-limited remote sensing satellite network, the supply of solar energy is precious, and the remote sensing satellite should fully utilize the acquired solar energy to realize the storage and transmission of data. Its characteristic function f₁(S_i,P_i) Is represented as follows:

wherein,

representing the energy consumption of the current time slot. The expression of the dimension of the invention considers the characteristics of the environment where the remote sensing satellite network is located and the self attribute of the remote sensing satellite from the aspect of energy, not only considers the acquisition process of solar energy, but also considers the static energy consumption of each time slot of the remote sensing satellite and the upper limit of the satellite-borne battery capacity.

(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Data overflow means that the energy consumed to receive data does not match the expected received data, which results in a portion of the received energy being wasted without reaching its expected return. Its characteristic function f₂(S_i,P_i) Is represented as follows:

wherein,

indicating the amount of data received by the satellite at the ith time slot,

is the amount of data transmitted by the satellite at the ith time slot,

D^{R max}indicating the maximum amount of received data. This dimension allows for the process of receiving data from a remote sensing satellite from the perspective of the data buffer.

(4b3) And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference₃(S_i,P_i) Need to be expressed separately according to the link condition.

Wherein,

the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four link switching situations of two consecutive time slots, which are: the satellite-to-ground link, the inter-satellite link to the inter-satellite link, the inter-satellite link to the satellite-to-ground link, and the satellite-to-ground link.

And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.

Wherein, B_s＝P_constX τ, representing static energy consumption, P_i ^WFWhich represents the value of the optimum power,

is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is_[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:

wherein,

representing the maximum value of transmit power within the current feasible action set,

denotes the lower rounding operation, δ_iIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slots

Can be expressed as follows:

wherein,

is shown asThe previous moment is

In the power allocation scheme of (3), after the transition to the next time, the maximum value of the transmission power in the feasible action set.

And in the continuous two intersatellite links, solving the optimal power according to linear programming.

Wherein, B'_i、D′_iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:

indicates that the current time is [ P ]_ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (1) for further two cases

Is limited to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

and in the inter-satellite link to the satellite-ground link, solving the optimal power distribution scheme according to a Lagrange multiplier method.

In the formula (2) -formula (3)

Is replaced by

Thus, B 'can be obtained'_i、D′_iAnd then obtain

The expression is as follows:

the optimal power is limited by the formula (1) to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power scheme according to a Lagrange multiplier method.

The optimal power is limited by the formula (1) to obtain an optimal power scheme

Then, the total data transmission amount in two time slots

Can be expressed as follows:

in summary, P is changed_irAnd calculate its correspondences

By comparing correspondences

Finding an optimal set of power allocation schemes

So that

The maximum is achieved, and the effect of power distribution in two continuous time slots including the current time slot and the next time slot is reflected.

(4b4) And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. This means that when the energy supply is abundant, the remote sensing satellite should perform resource scheduling with the largest energy consumption to acquire more solar energy for storing energy in the subsequent time slot. Its characteristic function f₄(S_i,P_i) Is represented as follows:

wherein,

representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite. The dimension embodies the characteristic that the capacity of the satellite-borne battery of the remote sensing satellite has an upper limit.

(4b5) And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f₅(S_i,P_i) Is represented as follows:

the dimension embodies the characteristic that the capacity of the remote sensing satellite data buffer area has an upper limit.

(4b6) And calculating the sixth dimension: the sixth dimension represents the received power allocation, and since the data memory has an upper limit of capacity, the greater the received power is not, the more data is stored. Therefore, the characteristic function f₆(S_i,P_i) The effectiveness of the received power allocation is reflected as follows:

f₆(S_i,P_i) Is the sixth characteristic function of the ith slot. This dimension represents the efficiency of the remote sensing satellite receiving data, i.e. whether the energy paid out matches the data actually stored in the data buffer.

The calculation result of the six feature functions is the six-dimensional feature vector for guiding the satellite power distribution, and is used as the action P selected in the step (4c)_iIs an important basis.

Because the state space of the remote sensing satellite network is continuous and infinite, the state cost function cannot be directly solved. In order to obtain the approximate state cost function, the invention adopts a linear approximation mode to carry out dot product operation on the six-dimensional characteristic vector and the weight vector to obtain the approximate action value which is used as the basis for selecting the power distribution scheme. The six-dimensional feature vector is obtained through six feature functions, the feature functions are functions of states and actions, and the definition of the six-dimensional feature vector is closely related to the environment of the remote sensing satellite network and the attributes of task transmission characteristics and the like of the remote sensing satellite.

The method is based on reinforcement learning, and helps the remote sensing satellite to carry out multidimensional resource joint scheduling only under the support of causal data; the problem of infinite state is solved by defining a six-dimensional characteristic function and a weight vector through a linear approximation method; the problem of overestimation during parameter updating is avoided to a certain extent by constructing two independent networks with the same structure and different parameters.

Example 4

The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as that in the embodiment 1-3, and the epsilon-greedy strategy used in the step (6) is the same as that in the step (4). In contrast, the search rate of the strategy of ε -greedy in step (6) 'is ∈'_iTo be changed, the exploration ratio of participation strategy is epsilon'_iAccording to epsilon'_i＝ε_i+1Is updated, and

since step (6) is a pre-allocation of the next slot power, the search rate ε'_iThe need for further reduction compared to step (4). The "learning" process continues to be conservative in the decline of the exploration rate, i.e., the power allocation scheme corresponding to the maximum approximate action price value is selected with a greater probability.

Example 5

The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is as in embodiments 1-4, and the step (8) updates omega according to the gradient descent strategy_actorThe process of updating the parameters is the process of continuously 'learning' and optimizing the weight vector. The method comprises the following steps:

(8a) sampling: store P in experience memory_iSame sample (f)_i,P_i,R_i,f′_i,P′_i) Dividing into one group, and recording the number of samples in each group as M_P. As the parameters of the satellite-ground link and the inter-satellite link are updated independently, the inter-satellite link and the inter-satellite link are respectively provided with an experience memory. In the respective experience memory, the respective Actor network parameters are sampled and updated by calculation.

(8b) Calculating a cost function Y (omega) of each group of samples_actor): for each set of samples, a cost function Y (ω) is calculated_actor)：

Wherein,

representing an approximate target action value function;

(8c) updating omega_actor: in the cost function, for ω_actorUsing a gradient descent strategy, ω is accomplished_actorUpdating:

wherein, the subscript n represents the time slot number corresponding to the sample number,

the learning rate of the current time slot is represented, and the operator network parameter omega is completed through assignment operation_actorAnd (4) updating.

The invention obtains the approximate action cost function in a linear approximation mode and is used for approximating the action cost function. Thus, the Actor network parameter ω_actorIt is updated in "learning" in a direction that the error of the motion cost function and the approximate motion cost function decreases.

The following is a detailed example to further illustrate the invention

Example 6

The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as the embodiment 1-5, referring to the figure 1, and comprises the following steps:

step 1, determining the scale of a remote sensing satellite network, taking a Markov decision process as a basic model of a multidimensional resource joint scheduling problem, and mainly introducing concepts of a state set, an action set, a reward and an action value function.

Referring to fig. 2, fig. 2 is a schematic diagram of a remote sensing satellite network model in the invention. The remote sensing satellite network mainly comprises a remote sensing satellite, a relay satellite and a ground station. The satellite-to-satellite link is established between the satellites, and the satellite-to-ground link is established between the satellites and the ground station. The inter-satellite link has unidirectional transmission from the remote sensing satellite to the relay satellite or bidirectional transmission between the relay satellites. The satellite-ground link only has one-way transmission from the remote sensing satellite and the relay satellite to the ground station. In order to complete continuous information transmission tasks, the remote sensing satellite, the relay satellite and the ground station need to be cooperated with each other. The concrete steps of the remote sensing satellite network modeling are realized as follows:

(1a) giving the scale of the remote sensing satellite network: the ground station, the remote sensing satellite and the relay satellite jointly form a remote sensing satellite network. The ground station is GS ═ GS₁,GS₂，...,GS_JJ, where J represents the total number of ground stations. The ground station is used for receiving data transmitted from the remote sensing satellite and the relay satellite and is the destination of all data. Remote sensing satellite scale RSS ═ RSS₁,RSS₂，...,RSS_KWhere K represents the total number of remote sensing satellites. The remote sensing satellite samples the environment information through the satellite-borne equipment and stores the environment information as data, and then transmits the data to the ground station or the relay satellite, so that the data is a remote sensing numberAccording to the starting point. Relay satellite scale RS ═ RS₁,RS₂，...,RS_LWhere L represents the total number of relay satellites. The relay satellite can help the remote sensing satellite to store and transmit data.

(1b) Time discretization: for convenience of analysis, the present invention divides the continuous time into several time slots with the same time length, the time slot length is marked as tau, and the total time slot number of the network operation is assumed as I. The ith slot is denoted as slot_i＝[t_i,t_i+1]Wherein I is 0,1₀0 denotes the operation start time, t_IIndicating the end of the run time. The invention uses subscript i to represent the starting time t of variable in ith time slot_iThe value of (c) is as follows.

(1c) Defining a state set S: the state set S is composed of multi-dimensional resources and mainly comprises a battery state B_iRepresenting the remaining battery capacity in joules (J); data buffer status D_iThe unit of the data storage of the data buffer area is bit (bit); channel parameter H_iThe link data transmission capacity is embodied and can be obtained by sensing of a satellite-borne sensor; solar energy absorbable by remote sensing satellite

The units are joules (J). Wherein H_iAnd

described is the environmental state, B_i、D_iStates of remote sensing satellites are described.

(1d) Define action set a: it is expressed in watts (W) by power value. The receiving power, the transmitting power of the satellite-ground link and the transmitting power of the inter-satellite link are discrete finite sets, and each set has a fixed offset. The amount of power affects the amount of data received and transmitted, as well as the amount of energy consumed. The set of the transmitting power between the satellites and the earth can be expressed as

The received power set may be expressed as

Where δ represents the step size, 0 represents no data being received or transmitted, P^MAXIndicating the maximum value of the received or transmitted power. Considering the difference of the distance between the remote sensing satellite and the ground station and between the remote sensing satellite and the relay satellite, the channel condition and the like, the maximum value of the inter-satellite transmission power is generally larger than that of the inter-satellite transmission power.

(1e) Defining a reward R: indicating the reward that the agent gets after pushing it to transition from one state to another by performing some action. The reward is a specific numerical value which can be set according to the scene. Considering that the task of the remote sensing satellite is data transmission, the invention transmits the data volume of the remote sensing satellite at one time

(in GB) as reward R_i. The larger the amount of data transmitted, the larger the reward, otherwise, the smaller the reward.

(1f) Defining an action cost function

Indicating that the agent is guided by strategy pi in state S_iNext, action P is performed_iThereafter, a expectation of return is obtained. The action value evaluates the effect of the action on all subsequent times, as follows:

wherein, the strategy pi is the basis of the agent selecting action. Since there is no final status for successive tasks, it is necessary to introduce a discount factor γ on the basis of the jackpot to converge the return.

And 2, pre-generating data in a topological period, wherein the data comprises link on-off, link connection time and time of each time slot of the remote sensing satellite in the sun as environmental data parameters of the method.

(2a) Under a group of remote sensing satellite orbit, relay satellite orbit and ground station position parameters, the network is simulated by using STK, and link connection, position and environment information in a topological period T (taking a time slot as a unit) are derived, wherein the link connection, position and environment information comprises the starting time, the ending time and the duration of the links established between the remote sensing satellite and all ground stations and relay satellites, the longitude, the latitude and the altitude information of the remote sensing satellite, and the time length of the remote sensing satellite in the sun.

(2b) And re-time-slotting the data by taking tau as a time slot length. Counting the time of each time slot remote sensing satellite on the sun, the on-off state of a link and the duration tau_ihAnd the position information of the remote sensing satellite at the starting moment of each time slot.

Step 3, initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity B_maxBattery capacity threshold B_minData memory capacity D_maxStatic power consumption P_consLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omega_criticActor network parameter ω_actorUpdate interval T of learning rate alpha, Critic network parameters_copyUpdate interval T of Actor network parameters_trainTraining total time slot number I, current time slot number I and discount factor gamma.

Step 4, observe the state S_iSelecting an action P in the set of feasible actions using an epsilon-greedy strategy_iAs a power allocation scheme for the current time slot.

Referring to FIG. 3, FIG. 3 is a sub-flow diagram of the present invention for selecting actions according to the ε -greedy policy; the specific implementation steps for selecting actions according to the epsilon-greedy strategy are as follows:

(4a) according to state S_iComputing a set of satisfied feasible actions { A_f}_iAll actions P of a condition_iIncluding the received power P_irAnd a transmission power P_itTwo parts are as follows:

wherein, C_it(P_it,H_i) Indicating the start time of the ith time slot based on the current transmission power P_itAnd channel parameter H_iThe link transmission rate of the following can be calculated as follows:

here, B_c(Hz) represents the channel bandwidth.

(4b) According to each set of state and action pair (S)_i,P_i) Calculating a six-dimensional feature vector f_i(S_i,P_i). The feature vector is a function of the state and the action, and represents the quality of the action executed based on the current state under different indexes by specific numerical values not exceeding 1, and specifically there are the following six-dimensional investigation.

Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f₁(S_i,P_i) Is represented as follows:

wherein,

representing the energy consumption of the current time slot.

And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f₂(S_i,P_i) Is represented as follows:

wherein,

representing the amount of data received by the remote sensing satellite at the ith time slot,

D^Rmaxindicating the maximum amount of received data.

And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference₃(S_i,P_i) Need to be expressed separately according to the link condition.

Wherein,

the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four categories that can be classified according to the link switching situation.

wherein,

Can be expressed as follows:

wherein,

indicates that the current time is

indicates that the current time is [ P ]_ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (4) for the two cases

Is limited to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

and solving the optimal power distribution in the inter-satellite link to the satellite-ground link according to a Lagrange multiplier method.

General formula (5) -formula(6) In (1)

Is replaced by

Thus, B 'can be obtained'_i、D′_iAnd then obtain

The expression is as follows:

the optimal power is limited by the formula (4) to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution scheme according to a Lagrange multiplier method.

The optimal power is limited by the formula (4) to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

changing P_irAnd calculate its correspondences

By comparing correspondences

Finding an optimal set of power allocation schemes

So that

And when the maximum value is reached, the power distribution condition under the current link can be embodied.

And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. Its characteristic function f₄(S_i,P_i) Is represented as follows:

wherein,

representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite.

And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f₅(S_i,P_i) Is represented as follows:

and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f₆(S_i,P_i) Is represented as follows:

(4c) selecting action P according to epsilon-greedy policy_i. The meaning of this strategy is that the remote sensing satellite selects the action that maximizes the approximate action cost function with a probability of 1-epsilon, and randomly selects the action with a probability of epsilon, as follows:

wherein,

representing the approximate cost function of motion in a linear fashion.

Step 5, using formula R_i＝τ_ih·C_it(P_it,H_i) Calculate the reward R that comes with this power allocation scheme_i. If I is less than I, the next step is carried out, otherwise, the step 10 is carried out.

Step 6, observing a pre-state S'_iSelecting an action P 'in the set of feasible actions using an ε -greedy policy'_iAs a power allocation scheme for next slot preselection, and samples (f)_i,P_i,R_i,f′_i,P′_i) And putting the network parameter into an experience memory for subsequent network parameter updating.

(6a) Observation of Pre-State S'_i：

H′_iCan be estimated by taking the average value of historical time, state B_i、D_iThe transfer process of (a) can be expressed as follows:

wherein, D'_iRepresents data buffer Pre-State, B'_iIndicating a battery charge pre-state.

(6b) According to

Rule updates ε'_iWherein is epsilon'_i＝ε_i+1。

(6c) Selecting an action P 'in a set of feasible actions using an epsilon-greedy policy'_iIs the same as step 4, except that the parameter epsilon of the epsilon-greedy strategy in step (4)_iNeed to be changed to epsilon'_i。

(6d) Sample (f)_i,P_i,R_i,f′_i,P′_i) And putting the data into an experience memory.

Step 7, judging whether the value meets i% T _copy0. If so, according to omega_critic＝ω_actorTo update omega_criticAnd carrying out the next step, otherwise, directly carrying out the next step.

Step 8, judging whether the value meets i% T _train0. If so, update ω according to a gradient descent strategy_actorAnd carrying out the next step, otherwise, directly carrying out the next step.

Referring to fig. 4, the specific implementation of this step is as follows:

(8a) in the experience memory, P_iThe same samples are divided into a group, and the number of the samples in each group is recorded as M_P。

(8b) For each set of samples, a cost function Y (ω) is calculated_actor)：

Wherein,

representing an approximate objective cost function.

(8c) In the cost function, for ω_actorUsing a gradient descent strategy, ω is accomplished_actorUpdating:

indicating the learning rate of the current time slot.

Step 9, updating the remote sensing satellite network parameters: s_i+1＝S′_i，P_i+1＝P′_iI +1, then go to step 5 and go to the next iteration.

Step 10, finishing the intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network, and outputting omega_criticFor resource joint scheduling.

The invention can better solve the optimization problem of the transmission performance of the remote sensing satellite in a time-varying and unmeasured network environment. The invention introduces the definition of weight vector based on reinforcement learning and establishes two independent networks with the same structure. The two networks continuously update respective parameters based on values in experience accumulation through algorithm guidance, and finally the converged parameters can provide guidance for multi-dimensional resource scheduling. The method is suitable for causal networks with unknown statistical characteristics, not only solves the problem of continuous network state space, but also avoids overestimation in parameter updating to a certain extent. In a word, the method can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provide guidance for the network specification and network optimization of the remote sensing satellite network.

The convergence and effectiveness of the present invention are explained below in conjunction with simulation experiments:

example 7

The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1-6,

simulation conditions and contents:

simulation software: STK, Matlab, Spyder;

simulation scene: the simulation scene of the invention consists of 3 relay satellites, 6 ground stations and 1 remote sensing satellite.

Simulation parameters: assume a set of inter-satellite transmit powers A_tsgIs {0:1:80}, and the set of transmission powers { A } between the stars is_tssIs {0:1:70}, and a set of received powers { A }_rIs {0:30:30 }. The satellite-ground link channel bandwidth is 250 MHz. Meanwhile, assuming that there is a fixed static power consumption of 10W per slot of the satellite, it is specified that if data is selected for reception, its reception rate is always 100 Mbit/s. In addition, relevant parameters τ of the learning process are set to 300(s), T to 288(slots), γ to 0.9, and T_copy＝3×T(slots)，T_train＝2×T(slots)，I＝10002×T(slots)，B_min＝0.6×B_max。

Simulation content: using the simulation scenario, the simulation software, and the network topology shown in fig. 2, the convergence of the method of the present invention will be described first. And then the data buffer capacity is taken as a resource variable, and the effectiveness of the method provided by the invention is illustrated under the comparison of three other methods.

And (3) simulation result analysis:

referring to fig. 5, fig. 5 is a cost value graph obtained by simulation of the present invention, in fig. 5, the abscissa is an Actor network parameter update time slot, and the ordinate is a cost value, where a dotted line represents a cost value change of a satellite-to-ground link, a solid line represents a cost value change of an inter-satellite link, and fig. 5 takes the cost value as an index to measure an approximation degree between an approximate action cost function and an action cost function. As can be seen from fig. 5, the overall situation of convergence after the decrease is presented as the learning process advances, regardless of the cost values of the satellite-to-ground links or the cost values of the inter-satellite links. This is because the Actor network parameters of the satellite-to-ground link and the inter-satellite link are updated according to the gradient descent strategy, that is, the network parameters are always updated in the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is gradually reduced, and in addition, the generation of the sample is influenced by the epsilon-greedy strategy, so that the integrated body is shown as that the cost value gradually descends in the fluctuation. When the learning process reaches a certain time, the exploration rate and the learning rate reach smaller values, the updating change of the network parameters is not large, and the representation on the graph is the convergence of the cost value. The cost value curve graph obtained by simulation is consistent with theoretical analysis, and the convergence of the method is verified.

Example 8

The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as the embodiments 1-6, and the simulation conditions and contents are the same as the embodiment 7

Referring to fig. 6, fig. 6 is a comparison graph of average period rewards obtained by simulation of the present invention, in fig. 6, the abscissa is the data buffer capacity, and the ordinate is the average period rewards of the remote sensing satellite network, wherein the solid dot line represents the intelligent resource joint scheduling method provided by the present invention, the solid dot line represents the greedy resource joint scheduling method, the solid square line represents the Q-learning resource joint scheduling method, and the solid triangular dot line represents the random resource joint scheduling method. Fig. 6 shows the effect of the variation of the data buffer margin on the network performance and the performance difference of the four methods under the four methods by using the average period transmission data amount as the performance index. In the figure, the method proposed by the present invention is the best overall performance, followed by the greedy method, the second best Q-learning method, and the worst performance is the random method. As can be seen from fig. 6, in the case of fixed resources in the other two dimensions, as the data buffer margin increases, the performance of the four methods shows a trend of first rising and then smoothing, which is the performance saturation caused by the resource limitation in the other dimensions. It is worth mentioning that the proposed method and the greedy method perform close to each other when the data buffer margin is small. This is because when the data buffer margin is small, the energy consumption requirement for transmitting the existing data is not large, and other resources are relatively abundant for the small data buffer margin, and the energy consumption requirement can be always satisfied, so that the method proposed by the present invention has a small significance for energy storage reuse, resulting in a performance similar to that of a greedy method.

In short, the invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the optimization problem of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing a remote sensing satellite network model with uncertain environment; generating data of an environmental parameter; initializing parameters required by the intelligent resource joint scheduling method; directing the satellite to perform power allocation; pre-transferring the state of the remote sensing satellite network; guiding the satellite to perform power pre-allocation; critic network parameter omega_criticUpdating and judging; actor network parameter ω_actorUpdating and judging; updating the state, the action and the current time slot number of the remote sensing satellite network; obtaining a network parameter omega for guiding joint scheduling_criticAnd guidance is provided for multi-dimensional resource scheduling. The invention fully considers the task transmission characteristics of the remote sensing satellite and the particularity of the network environment of the remote sensing satellite and obtains the environmental data parameters under certain scale and parameters through software simulation. In addition, the invention defines a six-dimensional characteristic function based on reinforcement learning, and then linearly approximates the action value function by combining a weight vector. Two independent, structurally identical networks continuously update their respective parameters based on the values during experience accumulation. The method is suitable for the remote sensing satellite network with unknown statistical characteristics, not only solves the problem of spatial continuity of the network state, but also avoids over estimation in parameter updating to a certain extent. The invention can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provides guidance for the network specification and network optimization of the remote sensing satellite network.

Claims

1. An intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving a high-complexity planning problem and continuously and infinitely solving a state space are avoided through reinforcement learning, and the method comprises the following steps:

Four parts; according to the standards of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, a dynamic channel model of the satellite-to-ground and inter-satellite links is established, and a channel parameter H is obtained through simulation_i(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energy

And

if not, then,

(3) initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity B_maxBattery capacity threshold B_minData memory capacity D_maxStatic power consumption P_consLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omega_criticActor network parameter ω_actorUpdate interval T of learning rate alpha, Critic network parameters_copyUpdate interval T of Actor network parameters_trainTraining total time slot number I, current time slot number I and discount factor gamma;

(4) and guiding the satellite to perform power distribution: observing the state S_iBased on each feasible action, through defined six-dimensional characteristics reflecting working characteristics and environmental influence of the remote sensing satelliteFunction, extracting feature vector f of state and action pair_i(S_i,P_i) Calculating and combining with an Actor network parameter omega_actorSelecting an action P in the set of feasible actions using an epsilon-greedy strategy_iAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;

(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'_iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'_i(S′_i,P′_i) In combination with an Actor network parameter omega_actorSelecting an action P in the set of feasible actions using an epsilon-greedy strategy_i' as the power allocation scheme for next slot pre-selection, and samples (f)_i,P_i,R_i,f′_i,P′_i) Putting the network parameter into an experience memory for subsequent network parameter updating;

(7) critic network parameter omega_criticUpdating and judging: update interval T for current time slot number i and Critic network parameters_copyPerforming a remainder operation to determine whether the remainder operation result satisfies i% T_copyIf 0, then according to ω_critic＝ω_actorTo update the critical network parameter omega_criticCarrying out the next step, otherwise, directly carrying out the next step;

(9) updating state, action and time of remote sensing satellite networkNumber of preceding slots: s_i+1＝S′_i，P_i+1＝P_i', i ═ i +1, one iteration is completed, and then go to step (5);

2. The method for jointly scheduling intelligent resources under the uncertain remote sensing satellite network according to claim 1, wherein the step (4) of directing the satellite to perform power distribution is to select an action P in a feasible action set by using an epsilon-greedy strategy_iThe method comprises the following steps:

(4a) computing a set of feasible actions { A_f}_i: from the state S of the remote sensing satellite network_iComputing a set of satisfied feasible actions { A_f}_iAll actions P of a condition_iIncluding the received power P_irAnd a transmission power P_itTwo parts are as follows:

wherein, tau_ihIndicating the duration of the link at the start of the ith time slot, C_it(P_it,H_i) Indicating the start time of the ith time slot based on the current transmission power P_itAnd channel parameter H_iThe link transmission rate of the following can be calculated as follows:

here, B_c(Hz) represents the channel bandwidth;

(4b) meterComputing six-dimensional feature vector f_i(S_i,P_i): for each set of state and action pairs (S)_i,P_i) Computing the feature vector f by six feature functions_i(S_i,P_i) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1_iPerformed action P_iIs a function of state and action;

wherein,

3. the intelligent resource joint scheduling method of claim 1, wherein said calculating six-dimensional feature vector f of step (4b)_i(S_i,P_i) Specifically, the following six-dimensional considerations are considered.

(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f₁(S_i,P_i) Is represented as follows:

wherein,

representing the energy consumption of the current time slot.

(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f₂(S_i,P_i) Is represented as follows:

wherein,

indicating the amount of data received by the satellite at the ith time slot,

is the amount of data transmitted by the satellite at the ith time slot,

D^Rmaxrepresents the maximum amount of received data;

(4b3) and calculating in a third dimension: the third dimension indicates whether the action is consistent with an optimal power allocation. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference₃(S_i,P_i) Need to be expressed separately according to the link condition.

Wherein,

wherein,

indicating that the current actionableThe maximum value of the transmit power in the set is made,

Can be expressed as follows:

wherein,

indicates that the current time is

In the power allocation method of (3), the maximum value of the transmission power in the feasible action set is obtained after the next time is shifted.

Wherein, B'_i、D'_iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:

indicates that the current time is [ P ]_ir,0]The maximum value of the transmission power in the feasible action set at the next time is determined in the power allocation method of (1). Reuse of formula (1) for further two cases

Is limited to obtain

Then, the total data transmission amount in two time slots

The expression is as follows:

In the formula (2) -formula (3)

Is replaced by

Thus, B 'can be obtained'_i、D'_iAnd then obtain

The expression is as follows:

the optimal power is limited by the formula (1) to obtain

Then, the total data transmission amount in two time slots

The expression is as follows:

in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution according to a Lagrange multiplier method;

the optimal power is limited by the formula (1) to obtain

Then, the total data transmission amount in two time slots

Can be expressed as follows:

in summary, P is changed_irAnd calculate its correspondences

By comparing correspondences

Finding an optimal set of power allocations

So that

When the maximum value is reached, the power distribution condition under the current link can be embodied;

(4b4) and calculating the fourth dimension: the fourth dimension represents whether the network resource can be fully utilized or not when the energy is abundant, so as to avoid energy waste, and the characteristic function f of the fourth dimension₄(S_i,P_i) Is represented as follows:

wherein,

representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite;

(4b5) and calculating a fifth dimension: in the second dimension, the characteristic value corresponding to data overflow is defined as 0, which is because the actually received data amount is not matched with the paid energy, resulting in energy waste; the fifth dimension is a supplement of the second dimension, which means that when the energy is abundant, the energy waste caused by data overflow is negligible; its characteristic function f₅(S_i,P_i) Is represented as follows:

(4b6) and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f₆(S_i,P_i) Is represented as follows:

f₆(S_i,P_i) A sixth characteristic function for the ith time slot;

4. The method for intelligent resource joint scheduling according to claim 1, wherein said step (6) uses epsilon-greedy policy as in step (4), and in step (6), participates in updated exploration rate epsilon'_iAccording to epsilon'_i＝ε_i+1Is updated, and

5. the intelligent resource joint scheduling method of claim 1, wherein said step (8) of updating ω according to a gradient descent strategy_actorThe method comprises the following steps:

(8a) sampling: store P in experience memory_iSame sample (f)_i,P_i,R_i,f′_i,P_i') are divided into one group, and the number of samples in each group is recorded as M_P；

Wherein,

representing an approximate objective cost function;

ω_actor＝ω_actor-Δω_actor