CN112422171A - Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network - Google Patents

Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network Download PDF

Info

Publication number
CN112422171A
CN112422171A CN202011251365.3A CN202011251365A CN112422171A CN 112422171 A CN112422171 A CN 112422171A CN 202011251365 A CN202011251365 A CN 202011251365A CN 112422171 A CN112422171 A CN 112422171A
Authority
CN
China
Prior art keywords
remote sensing
satellite
network
action
sensing satellite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011251365.3A
Other languages
Chinese (zh)
Other versions
CN112422171B (en
Inventor
周笛
王怡昕
盛敏
李建东
吴家鑫
戴诺伊
王晨光
白卫岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202011251365.3A priority Critical patent/CN112422171B/en
Publication of CN112422171A publication Critical patent/CN112422171A/en
Application granted granted Critical
Publication of CN112422171B publication Critical patent/CN112422171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18513Transmission in a satellite or space-based system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18519Operations control, administration or maintenance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0446Resources in time domain, e.g. slots or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Radio Relay Systems (AREA)

Abstract

The invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the problem of optimization of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing an uncertain environment remote sensing satellite network model; generating environmental parameter data; initializing required parameters; directing satellite power allocation; pre-transferring the network state; guiding satellite power pre-allocation; updating and judging network parameters; updating the network state, the action and the current time slot number; and obtaining the network parameters to provide guidance for multi-dimensional resource scheduling. The invention obtains the network environment parameter data under certain scale and parameters through software simulation, defines a six-dimensional characteristic function, and linearly approximates the action cost function by combining a weight vector. The invention solves the problem of continuous network state space, avoids over-estimation of parameter updating, adapts to the remote sensing satellite network in the dynamic random variation environment in the future and provides guidance for network gauge and network optimization.

Description

Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network
Technical Field
The invention relates to the technical field of satellite communication, mainly relates to remote sensing satellite network resource joint scheduling, in particular to an intelligent dynamic resource joint scheduling method under an environment-indeterminate remote sensing satellite network, and can be used for the remote sensing satellite network in a time-varying and unpredictable environment.
Background
Compared with a land communication network, the satellite network has the advantages of long communication distance, high communication quality, no geographic condition limitation on communication service, no influence of natural disasters and the like. In recent years, with the rapid increase of the demand of people on high-timeliness, high-precision and high-utility remote sensing data, the country continuously increases the investment and construction of satellite remote sensing services, and the remote sensing satellite network increasingly shows the value of social and economic development, close connection with the national economic development strategy and huge development potential. At present, the method is widely applied to various aspects such as agricultural assessment, ecological environment monitoring, weather forecasting, disaster prevention and reduction and the like, and is closely related to our lives. The remote sensing satellite network consists of a remote sensing satellite, a relay satellite and a ground station. The remote sensing data is acquired by a remote sensing satellite and is directly or indirectly transmitted to the ground station with the assistance of a relay satellite. The resource is the basis of remote sensing satellite network operation, and the resource scheduling directly influences the transmission performance, so the method has great significance for researching the multidimensional resource joint scheduling method.
Due to the orbital motion characteristic of the satellite, the remote sensing satellite periodically appears on the sun and the shade of the earth. In the sun of the earth, solar energy is supplied, and the supply amount is random and unpredictable due to the influence of solar energy sail loss, ion radiation and the like. When the earth is cloudy, no solar energy is supplied, and energy can be supplied only through a satellite-borne battery. Meanwhile, the remote sensing satellite has conventional static energy consumption and dynamic energy consumption (including task acquisition and transmission) guided by resource scheduling in order to maintain the normal operation of the system and execute tasks during the normal working period of the remote sensing satellite. In addition, due to different orbital characteristics, the topological structure of the remote sensing satellite network is another time-varying factor, which seriously affects the transmission of tasks. In conclusion, how to design an efficient multidimensional resource joint scheduling method to optimize the long-term performance of a network is an important problem to be researched urgently.
There are many researches on resource scheduling methods, which can be mainly classified into static and dynamic types. The static algorithm requires that all unknown environmental data including solar energy arrival, channel conditions and the like within the service time are obtained before task transmission, and then the resource scheduling problem is solved based on global planning. Dynamic algorithms do not require predictive data and can be further divided into two categories depending on whether statistical features are relied upon. The first one includes BIBM algorithm that is published by authors such as D.Zhou on IEEE Wireless Communications Letters and proposed in "Session QoS and software Service Lifetime handoff in Remote Sensing software Networks", and the algorithm combines three aspects of task acquisition, processing and sending, and solves the problem of resource scheduling based on a state transition probability model. The second includes the Approximate SARSA algorithm, which is mentioned in "information learning for energy transforming point-to-point Communications" published by authors of A.Ortiz et al in 2016IEEE International Conference on Communications, Kuala lumuru, and gets the optimal resource scheduling policy only through causal data without the limitation of knowing statistical characteristics.
The two methods mentioned above basically cover the current research situation of the resource scheduling method. The static algorithm is too ideal, and the non-causality of the static algorithm cannot be applied to a remote sensing satellite network. The first dynamic algorithm is equally inapplicable since the remote sensing satellite network does not have fixed, deterministic statistical features. The second dynamic algorithm can be used for solving the resource scheduling problem of the remote sensing satellite network, but the current research mostly focuses on a common energy collection system, and does not consider the task flow of the remote sensing satellite network and the particularity of the environment where the remote sensing satellite network is located.
Disclosure of Invention
The invention aims to design an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where an environment uncertain remote sensing satellite network is located and the resource scheduling scene thereof, aiming at the defects and limitations of the prior art.
The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving the planning problem with high complexity and the continuous and infinite state space are avoided through reinforcement learning, and the method comprises the following steps:
(1) establishing a remote sensing satellite network model with uncertain environment: establishing a remote sensing satellite network model with uncertain environment, firstly determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station, and then defining a state set S, an action set A, a reward R and an action value function of the remote sensing satellite network
Figure BDA0002767545850000021
The state set S ═ { B × D × H × EHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energy
Figure BDA0002767545850000022
Four parts; according to ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, establishing dynamic channel model of satellite-to-ground and inter-satellite link, and obtaining channel parameter H by simulationi(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energy
Figure BDA0002767545850000023
The action set A ═ { A ═ Ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed as
Figure BDA0002767545850000024
And
Figure BDA0002767545850000025
where δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,
Figure BDA0002767545850000026
if not, then,
Figure BDA0002767545850000027
the reward R is expressed by the data volume sent by the satellite at the initial time of the time slot; the action cost function
Figure BDA0002767545850000028
The meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; completing the establishment of a remote sensing satellite network model with uncertain environment;
(2) generating data of the environmental parameters: deriving original data of environmental parameters in a topological period through an STK software simulation remote sensing satellite network model, and processing the original data through MATLAB software to obtain link on-off, link connection duration, remote sensing satellite position and duration of each time slot in the sun, wherein the data are used as environmental parameter data of an intelligent resource joint scheduling method;
(3) initializing intelligent resource federationParameters required by the scheduling method are as follows: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma;
(4) and guiding the satellite to perform power distribution: observing the state SiExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible actioni(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;
(5) pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if the I is I, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed;
(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) Putting the network parameter into an experience memory for subsequent network parameter updating;
(7) critic network parameter omegacriticUpdating and judging: for current time slot number i and Critic networkUpdate interval T of parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticCarrying out the next step, otherwise, directly carrying out the next step;
(8) actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorCarrying out the next step, otherwise, directly carrying out the next step;
(9) updating the state, the action and the current time slot number of the remote sensing satellite network: si+1=S′i,Pi+1=P′iI is i +1, completing one iteration, and then turning to the step (5);
(10) obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticThe intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished; in practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0).
Aiming at the remote sensing satellite network, on one hand, the invention comprehensively considers the task transmission flow of the remote sensing satellite network and the environmental characteristics of the remote sensing satellite network, and establishes a relatively comprehensive remote sensing satellite network model. On the other hand, in the design stage, the optimization problem is solved on the basis of reinforcement learning, the complexity is reduced, the problem of continuous state space of the remote sensing satellite network is solved by defining a characteristic function and a weight vector and utilizing a linear approximation mode, the optimal resource scheduling scheme is searched on the basis of accurately characterizing the state, and the accuracy of the result is improved.
The invention helps the remote sensing satellite balance the battery resource and data transmission in dynamic environment, ensures the remote sensing satellite network to efficiently transmit tasks, and improves the transmission performance.
Compared with the prior art, the invention has the following advantages:
the characteristics of satellite operation are reflected in resource scheduling: the invention fully considers the satellite attribute and the working characteristics in the design of the resource joint scheduling method, such as the static energy consumption required by various systems such as satellite-borne thermal control, satellite affair and the like to maintain the normal operation of the remote sensing satellite. The multi-dimensional resources of the satellite are considered jointly, the resource scheduling characteristics of acquisition and transmission of the joint tasks of the remote sensing satellite are reflected, and the optimal resource scheduling scheme is determined based on the resource scheduling characteristics.
The scheduling method is closer to a remote sensing satellite network: according to the invention, a remote sensing satellite network model is built, environmental data and position data of the remote sensing satellite network running in a topological period with determined parameters and scale are obtained through STK software simulation, and MATLAB is used for processing the original data to obtain time-slotted environmental and position parameters, so that the scene of the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is more practical.
Solving a complex constraint planning problem based on reinforcement learning: the method utilizes the reinforcement learning idea to ensure that the solution of the multidimensional resource joint scheduling problem is independent of any non-causal data and statistical characteristics; aiming at the problem of continuous and infinite state space, the invention provides a six-dimensional characteristic vector reflecting the environmental characteristics of the remote sensing satellite network and the task transmission characteristics of the remote sensing satellite in the design of a resource joint scheduling method, maps the state and action pairs to the six-dimensional characteristic vector, and evaluates the quality of the action in a linear approximation mode by combining a weight vector, thereby solving the problem of infinite state space of the remote sensing satellite network and avoiding the storage problem caused by state discretization processing. In addition, the invention constructs two independent networks with the same structure and different parameters, thereby avoiding the problem of over-estimation during parameter updating to a certain extent.
Drawings
FIG. 1 is an overall flow diagram of an implementation of the present invention;
FIG. 2 is a schematic diagram of a remote sensing satellite network model in the present invention;
FIG. 3 is a sub-flow diagram of the present invention for selecting actions based on the ε -greedy policy;
FIG. 4 is a sub-flow diagram of the gradient descent strategy of the present invention;
FIG. 5 is a graph of cost values in the present invention;
FIG. 6 is a comparison graph of average periodic rewards in the present invention.
The invention is described in detail below with reference to the attached drawings and examples
Detailed Description
Example 1
The particularity and the dynamic variability of the remote sensing satellite network environment and the diversity of the energy consumption of the remote sensing satellite enable the remote sensing satellite network to be different from other communication networks. Research on remote sensing satellite network resource scheduling is numerous, and environmental data can be classified into static and dynamic according to the need of predicting whether the environmental data is needed or not. Static algorithms are based on known conditions, meaning that the environment data at all times in the future needs to be known before the satellite starts a transmission task. Although the static algorithm improves the upper bound of the performance of the remote sensing satellite network, the application of the static algorithm is limited due to non-causality of the static algorithm because the static algorithm is over-ideal, so that the application scene is few, and the static algorithm cannot meet most scenes in actual life. The dynamic algorithm is based on an unknown environment, which means that the remote sensing satellite does not need to give any environmental data in advance, and the dynamic algorithm can be further divided into two algorithms. The first category of dynamic methods refers to dynamic programming, which can be used to solve such problems when the statistical characteristics of the data, such as state transition probabilities, are known, based on a Markov decision process model. However, the operation complexity of the dynamic programming method is greatly increased along with the expansion of the problem scale, and serious calculation burden is brought to low-power equipment; meanwhile, not all processes have statistical characteristics, and the statistical characteristics may change with conditions and time, so that the method still has disadvantages. The second dynamic method is based on reinforcement learning and does not need series conditions such as environmental data or state transition models and the like as a premise, which means that the environmental data at each moment can be obtained only when the moment arrives, and the method is more suitable for the actual situation of the remote sensing satellite network. However, based on the research of the method, the data receiving process of the remote sensing satellite network is omitted, or the particularity of the environment where the remote sensing satellite network is located, such as the time-varying property of channel conditions and the characteristic that the remote sensing satellite operates under the alternate sunny and shady surfaces, is omitted. From the research situation, the characteristics above the remote sensing satellite network are not comprehensively considered in the current research, and the multi-dimensional resource joint scheduling problem of the remote sensing satellite network is yet to be further researched.
Aiming at the current situation, the invention designs an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where the remote sensing satellite network with uncertain environment is located and the resource scheduling scene thereof through research and experiments.
The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, wherein an established network model is suitable for the environment where the remote sensing satellite network is located and a resource scheduling scene of the remote sensing satellite network, and the problem of directly solving a high-complexity planning problem is avoided through reinforcement learning, and the method comprises the following steps of:
(1) establishing a remote sensing satellite network model with uncertain environment: a remote sensing satellite network model with uncertain environment is established, and the remote sensing satellite network mainly comprises a remote sensing satellite for acquiring and transmitting data and a ground station for receiving the data. Firstly, determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station. Then defining a state set S, an action set A, an award R and an action value function of the remote sensing satellite network
Figure BDA0002767545850000061
State set S, S ═ { B × D × H × E in the present inventionHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energy
Figure BDA0002767545850000062
And fourthly, the method comprises the following steps. Wherein the battery capacity has an upper limit BmaxAnd a lower limit of BminThe data buffer also has an upper bound DmaxSatellite-earthThe channel and the inter-satellite channel have different channel models, channel parameters have different calculation modes, absorbed solar energy can change along with the alternation of the shade and the sun of the remote sensing satellite, and obviously, when the remote sensing satellite is positioned on the shade, no solar energy is supplied. The invention establishes a dynamic channel model of a satellite-to-ground link and an inter-satellite link according to the standards of the recommendation of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839, and obtains a channel parameter H through simulationi. The invention considers the orbit characteristic of satellite operation, establishes a dynamic energy collection model, and simulates to obtain the absorbed solar energy
Figure BDA0002767545850000063
Action set a ═ ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed as
Figure BDA0002767545850000064
And
Figure BDA0002767545850000065
where δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,
Figure BDA0002767545850000066
if not, then,
Figure BDA0002767545850000067
the reward R is expressed in terms of the amount of data transmitted by the satellite at the initial time of the time slot. Action cost function in the invention
Figure BDA0002767545850000068
The meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; and finishing the establishment of the remote sensing satellite network model with uncertain environment. In the step, the method considers the process of acquiring data by the remote sensing satellite, and embodies the resource scheduling characteristic of the joint planning of the acquired and transmitted data by the remote sensing satellite network. In addition, the bookThe method establishes two channel models of the uncertain environment in which the remote sensing satellite network is positioned, considers the characteristic of energy supply, and is more suitable for the remote sensing satellite network scene.
(2) Generating data of the environmental parameters: and (3) deriving original data of environmental parameters in a topological period by simulating a remote sensing satellite network model through STK software, wherein the original data comprises the initial time, the termination time and the duration of links established between the remote sensing satellite and all ground stations and relay satellites, longitude, latitude and height information of the remote sensing satellite and the duration of the remote sensing satellite on the sun. And processing the original data through MATLAB software, carrying out time slot processing on the data again, and obtaining the duration of each time slot of the remote sensing satellite in the sun, the link on-off state and the link duration again by taking tau as the unit time slot length, wherein the data is used as environmental parameter data of the intelligent resource joint scheduling method. In the step, the remote sensing satellite network operation scene constructed by the STK software simulation is utilized, the original environment parameter data is obtained, and the MATLAB software is combined to process the original environment parameter data, so that the simulation result of the method provided by the invention is more accurate.
(3) Initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma.
(4) And guiding the satellite to perform power distribution: observing the state SiExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible actioni(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAnd as a power allocation scheme of the current time slot, the satellite is guided to perform power allocation. In the step, the six-dimensional characteristic function reflects the characteristic of joint scheduling of the multi-dimensional resources of the remote sensing satellite network from six angles respectively, is used for extracting characteristic vectors of state and action pairs and is used for approximating a cost function. The use of the epsilon-greedy strategy avoids the resource scheduling method from falling into a locally optimal situation.
(5) Pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if so, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed. Since the method provided by the invention can be converged after a certain number of iterations, it is specified that if the current time slot number I is equal to the training total time slot number I, the intelligent resource joint scheduling method is ended.
(6) And guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) And putting the network parameter into an experience memory for subsequent network parameter updating. The power pre-allocation in this step refers to allocating power when the current time slot i is the starting time of the next time slot, and is still within the time slot i, so the power is called pre-allocation, and the mark 'represents a pre-variable f'i,P′i,S′i. Considering the limited capacity of the empirical memory, when the number of samples reaches the upper limit of the capacity, the storage of the subsequent samples is always performed according to the rule that the new samples replace the old samples.
(7) Critic network parameter omegacriticUpdating and judging: update interval T for current time slot number i and Critic network parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticAnd carrying out the next step, otherwise, directly carrying out the next step. Critic network parameter omegacriticAnd participating in the calculation of the approximate target action value, wherein the parameter is also an output parameter of the method and is used for guiding the joint scheduling of the multidimensional resource. Wherein, Critic network parameter omegacriticIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omegacriticAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground linkcriticω of inter-satellite linkscriticWith a different number of rows.
(8) Actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorAnd carrying out the next step, otherwise, directly carrying out the next step. Actor network parameter ωactorParticipate in the selection of the power allocation scheme. Wherein, the Actor network parameter ωactorIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omegaactorAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground linkactorω of inter-satellite linksactorWith a different number of rows.
(9) Updating the state, the action and the current iteration number of the remote sensing satellite network: si+1=S′i,Pi+1=P′iCompleting one iteration, and then turning to the step (5) to perform remote sensing satellite network state pre-transition; in step (5), the result is output or a new iteration is started by judging, or finishing the iteration. In this step, the new state Si+1And new actionsPi+1Namely the pre-state S 'in the steps (5) and (6)'iAnd Pre-motion P'i
(10) Obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticAnd the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished. In practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0). In particular, at the start of each time slot during the operation of the remote sensing satellite, according to the current state SiExtracting the status and action pair (S) formed by the action and each action in the feasible action seti,Pi) In combination with ωcriticCalculating the approximate action value, and selecting the action P corresponding to the maximum approximate action valueiAs the best action, the action is executed, the next time slot is shifted, and the steps are repeated.
Aiming at the defects that the prior art does not comprehensively consider the remote sensing satellite network scene and the task transmission flow of the remote sensing satellite, the working characteristics and the environment characteristics of the remote sensing satellite are comprehensively considered, time-varying environment data and link information are obtained through a simulation network, reinforcement learning is used as an algorithm frame, a Markov decision process is used as a basic model, and the ideas of linear approximation and double networks are added, so that the whole technical scheme for realizing the intelligent resource joint scheduling of the remote sensing satellite under the remote sensing satellite network with uncertain environment is provided, and the problem of effectively utilizing the multidimensional resources of the remote sensing satellite in a joint manner to optimize the transmission performance of the remote sensing satellite network is solved. The idea of the invention is as follows: firstly, establishing a remote sensing satellite network model and defining a task transmission flow of a remote sensing satellite; secondly, the maximum network transmission data volume is taken as an objective function, a constraint function is listed according to the environment and the satellite attribute, and the resource joint scheduling problem is modeled as an optimization problem, but the optimization problem cannot be directly solved because non-causal data cannot be obtained in a remote sensing satellite network; and then, an intelligent resource joint scheduling method is provided, aiming at guiding the remote sensing satellite to realize the optimal intelligent resource joint scheduling in continuous learning under the control of the method only according to causal data. The invention leads the remote sensing satellite to continuously accumulate experience in trial and error from zero experience in a learning period and continuously update parameters based on values until convergence. And the convergence parameters output by the method are used as the basis of final power distribution to realize the multidimensional resource joint scheduling. In the early stage, the invention simulates a network to obtain the link connection and energy arrival conditions in a topological period as the environmental parameters of the method. In the method, at the starting moment of each time slot, the remote sensing satellite carries out power distribution according to an epsilon-greedy strategy. Specifically, the remote sensing satellite randomly selects a power allocation scheme with a probability of ε, and selects a power allocation scheme that maximizes the cost function of action with a probability of 1- ε. The remote sensing satellite state value is infinite, so the invention introduces the concept of weight vector, and the inner product is obtained by the weight vector and the feature vector obtained by feature extraction, and the inner product is linearly approximate to the action cost function. And determining and executing a power distribution scheme, wherein the remote sensing satellite can obtain a feedback reward to be used as the evaluation of the power distribution, and a state pre-transfer process is carried out. And selecting a power allocation scheme according to an epsilon-greedy strategy, executing the power allocation scheme at the starting moment of the next time slot, and continuously repeating the process in the advancing of the time slot, wherein each process can be regarded as one time slot iteration. And saving part of parameters in each time slot to an experience memory as a sample of a subsequent network parameter updating process. And updating the Actor network parameters at regular intervals according to a gradient descent strategy, wherein the updating of the parameters is to adjust the weight vector to update the weight vector towards the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is reduced. The criticic network parameters are copied from the Actor network, and updating is completed. In order to ensure the convergence of the method, both the exploration rate epsilon and the learning rate alpha are reduced along with the increase of the iteration number. The method can be understood that, as the number of iterations increases, the effectiveness of the multidimensional resource joint scheduling strategy under the guidance of the weight vector is higher and higher, and the promotion space is smaller and smaller. Simulation results show that the method can be converged after a certain number of iterations, and the performance is superior to that of other comparison methods under the same condition.
Example 2
The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1, and in order to avoid the resource scheduling method from falling into local optimization, an epsilon-greedy strategy is adopted in a learning stage. In the early stage of the learning stage, the remote sensing satellite is more inclined to explore, namely, a resource scheduling scheme which is not tried is adopted; in the later stage of the learning stage, the remote sensing satellite is more greedy, namely, the best remote sensing satellite is selected as a resource scheduling scheme from the existing experience.
The method for guiding the satellite to carry out power distribution in the step (4) selects an action P in the feasible action set by using an epsilon-greedy strategyiAs a combination of a pair of receiving and transmitting power, the remote sensing satellite receives and transmits corresponding data at the cost of energy consumption according to the power distribution scheme under the current environmental condition, and the method specifically comprises the following steps:
(4a) computing a set of feasible actions { Af}i: because the remote sensing satellite is constrained by the satellite-borne battery energy and the data buffer capacity, the remote sensing satellite must perform power distribution on the premise of meeting the resource constraint. Capacity threshold BminThe remote sensing satellite system has the advantage that the service life of the remote sensing satellite system is prevented from being exhausted due to too low battery energy of the remote sensing satellite. The invention thus provides that the battery capacity B of the remote sensing satellite is usediLower than Bmin+PconsAnd when the x tau is multiplied, the remote sensing satellite does not transmit or receive any data. From the state S of the remote sensing satellite networkiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts, the resource constraint relationship is specifically as follows:
Figure BDA0002767545850000101
wherein, tauihIndicating the duration of the link at the start of the ith time slot, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiLower chainThe path transmission rate can be calculated as follows:
Figure BDA0002767545850000102
here, Bc(Hz) represents the channel bandwidth.
(4b) Computing a six-dimensional feature vector fi(Si,Pi): for each set of state and action pairs (S)i,Pi) Computing the feature vector f by six feature functionsi(Si,Pi) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1iPerformed action PiIs a function of state and action.
(4c) Selecting action P according to epsilon-greedy policyi: according to the epsilon-greedy strategy, the epsilon-greedy strategy means that the remote sensing satellite has a feasible action set { A }f}iIn which either the action with the highest value of the approximate action is chosen with a probability of 1-epsilon or the action is chosen randomly with a probability of epsilon, only one result being the action P chosen according to the epsilon-greedy strategyiExpressed as follows:
Figure BDA0002767545850000103
wherein,
Figure BDA0002767545850000104
indicates that the ith time slot is based on a state and action pair (S)i,Pi) And Actor network parameter ωactorThe approximate cost function of the motion of (c),
Figure BDA0002767545850000105
here, the state space of the remote sensing satellite network is continuous and infinite by the definition of the state set S, and even if the state is discretized, the size of the state table is enormous in order to reduce distortion as much as possible, which makes storage difficult. Therefore, the invention adopts a linear approximation mode directlyPlanning is performed on successive states.
Example 3
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as that in the embodiment 1 and the step (4b) of calculating the six-dimensional feature vector fi(Si,Pi) Specifically, the following six-dimensional considerations are considered.
(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. In a resource-limited remote sensing satellite network, the supply of solar energy is precious, and the remote sensing satellite should fully utilize the acquired solar energy to realize the storage and transmission of data. Its characteristic function f1(Si,Pi) Is represented as follows:
Figure BDA0002767545850000111
wherein,
Figure BDA0002767545850000112
representing the energy consumption of the current time slot. The expression of the dimension of the invention considers the characteristics of the environment where the remote sensing satellite network is located and the self attribute of the remote sensing satellite from the aspect of energy, not only considers the acquisition process of solar energy, but also considers the static energy consumption of each time slot of the remote sensing satellite and the upper limit of the satellite-borne battery capacity.
(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Data overflow means that the energy consumed to receive data does not match the expected received data, which results in a portion of the received energy being wasted without reaching its expected return. Its characteristic function f2(Si,Pi) Is represented as follows:
Figure BDA0002767545850000113
wherein,
Figure BDA0002767545850000114
indicating the amount of data received by the satellite at the ith time slot,
Figure BDA0002767545850000115
is the amount of data transmitted by the satellite at the ith time slot,
Figure BDA0002767545850000116
DR maxindicating the maximum amount of received data. This dimension allows for the process of receiving data from a remote sensing satellite from the perspective of the data buffer.
(4b3) And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Figure BDA0002767545850000117
Wherein,
Figure BDA0002767545850000118
the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four link switching situations of two consecutive time slots, which are: the satellite-to-ground link, the inter-satellite link to the inter-satellite link, the inter-satellite link to the satellite-to-ground link, and the satellite-to-ground link.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Figure BDA0002767545850000121
Figure BDA0002767545850000122
Figure BDA0002767545850000123
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,
Figure BDA0002767545850000124
is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
Figure BDA0002767545850000125
wherein,
Figure BDA0002767545850000126
representing the maximum value of transmit power within the current feasible action set,
Figure BDA0002767545850000127
denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slots
Figure BDA0002767545850000128
Can be expressed as follows:
Figure BDA0002767545850000129
wherein,
Figure BDA00027675458500001210
is shown asThe previous moment is
Figure BDA00027675458500001211
In the power allocation scheme of (3), after the transition to the next time, the maximum value of the transmission power in the feasible action set.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Figure BDA00027675458500001212
Wherein, B'i、D′iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
Figure BDA00027675458500001213
Figure BDA00027675458500001214
Figure BDA0002767545850000131
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (1) for further two cases
Figure BDA0002767545850000132
Is limited to obtain
Figure BDA0002767545850000133
Then, the total data transmission amount in two time slots
Figure BDA0002767545850000134
Can be expressed as follows:
Figure BDA0002767545850000135
and in the inter-satellite link to the satellite-ground link, solving the optimal power distribution scheme according to a Lagrange multiplier method.
Figure BDA0002767545850000136
In the formula (2) -formula (3)
Figure BDA0002767545850000137
Is replaced by
Figure BDA0002767545850000138
Thus, B 'can be obtained'i、D′iAnd then obtain
Figure BDA0002767545850000139
The expression is as follows:
Figure BDA00027675458500001310
the optimal power is limited by the formula (1) to obtain
Figure BDA00027675458500001311
Then, the total data transmission amount in two time slots
Figure BDA00027675458500001312
Can be expressed as follows:
Figure BDA00027675458500001313
and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power scheme according to a Lagrange multiplier method.
Figure BDA00027675458500001314
The optimal power is limited by the formula (1) to obtain an optimal power scheme
Figure BDA00027675458500001315
Then, the total data transmission amount in two time slots
Figure BDA00027675458500001316
Can be expressed as follows:
Figure BDA00027675458500001317
in summary, P is changedirAnd calculate its correspondences
Figure BDA00027675458500001318
By comparing correspondences
Figure BDA00027675458500001319
Finding an optimal set of power allocation schemes
Figure BDA00027675458500001320
So that
Figure BDA00027675458500001321
The maximum is achieved, and the effect of power distribution in two continuous time slots including the current time slot and the next time slot is reflected.
(4b4) And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. This means that when the energy supply is abundant, the remote sensing satellite should perform resource scheduling with the largest energy consumption to acquire more solar energy for storing energy in the subsequent time slot. Its characteristic function f4(Si,Pi) Is represented as follows:
Figure BDA00027675458500001322
wherein,
Figure BDA0002767545850000141
representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite. The dimension embodies the characteristic that the capacity of the satellite-borne battery of the remote sensing satellite has an upper limit.
(4b5) And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f5(Si,Pi) Is represented as follows:
Figure BDA0002767545850000142
the dimension embodies the characteristic that the capacity of the remote sensing satellite data buffer area has an upper limit.
(4b6) And calculating the sixth dimension: the sixth dimension represents the received power allocation, and since the data memory has an upper limit of capacity, the greater the received power is not, the more data is stored. Therefore, the characteristic function f6(Si,Pi) The effectiveness of the received power allocation is reflected as follows:
Figure BDA0002767545850000143
f6(Si,Pi) Is the sixth characteristic function of the ith slot. This dimension represents the efficiency of the remote sensing satellite receiving data, i.e. whether the energy paid out matches the data actually stored in the data buffer.
The calculation result of the six feature functions is the six-dimensional feature vector for guiding the satellite power distribution, and is used as the action P selected in the step (4c)iIs an important basis.
Because the state space of the remote sensing satellite network is continuous and infinite, the state cost function cannot be directly solved. In order to obtain the approximate state cost function, the invention adopts a linear approximation mode to carry out dot product operation on the six-dimensional characteristic vector and the weight vector to obtain the approximate action value which is used as the basis for selecting the power distribution scheme. The six-dimensional feature vector is obtained through six feature functions, the feature functions are functions of states and actions, and the definition of the six-dimensional feature vector is closely related to the environment of the remote sensing satellite network and the attributes of task transmission characteristics and the like of the remote sensing satellite.
The method is based on reinforcement learning, and helps the remote sensing satellite to carry out multidimensional resource joint scheduling only under the support of causal data; the problem of infinite state is solved by defining a six-dimensional characteristic function and a weight vector through a linear approximation method; the problem of overestimation during parameter updating is avoided to a certain extent by constructing two independent networks with the same structure and different parameters.
Example 4
The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as that in the embodiment 1-3, and the epsilon-greedy strategy used in the step (6) is the same as that in the step (4). In contrast, the search rate of the strategy of ε -greedy in step (6) 'is ∈'iTo be changed, the exploration ratio of participation strategy is epsilon'iAccording to epsilon'i=εi+1Is updated, and
Figure BDA0002767545850000144
since step (6) is a pre-allocation of the next slot power, the search rate ε'iThe need for further reduction compared to step (4). The "learning" process continues to be conservative in the decline of the exploration rate, i.e., the power allocation scheme corresponding to the maximum approximate action price value is selected with a greater probability.
Example 5
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is as in embodiments 1-4, and the step (8) updates omega according to the gradient descent strategyactorThe process of updating the parameters is the process of continuously 'learning' and optimizing the weight vector. The method comprises the following steps:
(8a) sampling: store P in experience memoryiSame sample (f)i,Pi,Ri,f′i,P′i) Dividing into one group, and recording the number of samples in each group as MP. As the parameters of the satellite-ground link and the inter-satellite link are updated independently, the inter-satellite link and the inter-satellite link are respectively provided with an experience memory. In the respective experience memory, the respective Actor network parameters are sampled and updated by calculation.
(8b) Calculating a cost function Y (omega) of each group of samplesactor): for each set of samples, a cost function Y (ω) is calculatedactor):
Figure BDA0002767545850000151
Figure BDA0002767545850000152
Figure BDA0002767545850000153
Wherein,
Figure BDA0002767545850000154
representing an approximate target action value function;
(8c) updating omegaactor: in the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
Figure BDA0002767545850000155
wherein, the subscript n represents the time slot number corresponding to the sample number,
Figure BDA0002767545850000156
the learning rate of the current time slot is represented, and the operator network parameter omega is completed through assignment operationactorAnd (4) updating.
The invention obtains the approximate action cost function in a linear approximation mode and is used for approximating the action cost function. Thus, the Actor network parameter ωactorIt is updated in "learning" in a direction that the error of the motion cost function and the approximate motion cost function decreases.
The following is a detailed example to further illustrate the invention
Example 6
The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as the embodiment 1-5, referring to the figure 1, and comprises the following steps:
step 1, determining the scale of a remote sensing satellite network, taking a Markov decision process as a basic model of a multidimensional resource joint scheduling problem, and mainly introducing concepts of a state set, an action set, a reward and an action value function.
Referring to fig. 2, fig. 2 is a schematic diagram of a remote sensing satellite network model in the invention. The remote sensing satellite network mainly comprises a remote sensing satellite, a relay satellite and a ground station. The satellite-to-satellite link is established between the satellites, and the satellite-to-ground link is established between the satellites and the ground station. The inter-satellite link has unidirectional transmission from the remote sensing satellite to the relay satellite or bidirectional transmission between the relay satellites. The satellite-ground link only has one-way transmission from the remote sensing satellite and the relay satellite to the ground station. In order to complete continuous information transmission tasks, the remote sensing satellite, the relay satellite and the ground station need to be cooperated with each other. The concrete steps of the remote sensing satellite network modeling are realized as follows:
(1a) giving the scale of the remote sensing satellite network: the ground station, the remote sensing satellite and the relay satellite jointly form a remote sensing satellite network. The ground station is GS ═ GS1,GS2,...,GSJJ, where J represents the total number of ground stations. The ground station is used for receiving data transmitted from the remote sensing satellite and the relay satellite and is the destination of all data. Remote sensing satellite scale RSS ═ RSS1,RSS2,...,RSSKWhere K represents the total number of remote sensing satellites. The remote sensing satellite samples the environment information through the satellite-borne equipment and stores the environment information as data, and then transmits the data to the ground station or the relay satellite, so that the data is a remote sensing numberAccording to the starting point. Relay satellite scale RS ═ RS1,RS2,...,RSLWhere L represents the total number of relay satellites. The relay satellite can help the remote sensing satellite to store and transmit data.
(1b) Time discretization: for convenience of analysis, the present invention divides the continuous time into several time slots with the same time length, the time slot length is marked as tau, and the total time slot number of the network operation is assumed as I. The ith slot is denoted as sloti=[ti,ti+1]Wherein I is 0,100 denotes the operation start time, tIIndicating the end of the run time. The invention uses subscript i to represent the starting time t of variable in ith time slotiThe value of (c) is as follows.
(1c) Defining a state set S: the state set S is composed of multi-dimensional resources and mainly comprises a battery state BiRepresenting the remaining battery capacity in joules (J); data buffer status DiThe unit of the data storage of the data buffer area is bit (bit); channel parameter HiThe link data transmission capacity is embodied and can be obtained by sensing of a satellite-borne sensor; solar energy absorbable by remote sensing satellite
Figure BDA0002767545850000161
The units are joules (J). Wherein HiAnd
Figure BDA0002767545850000162
described is the environmental state, Bi、DiStates of remote sensing satellites are described.
(1d) Define action set a: it is expressed in watts (W) by power value. The receiving power, the transmitting power of the satellite-ground link and the transmitting power of the inter-satellite link are discrete finite sets, and each set has a fixed offset. The amount of power affects the amount of data received and transmitted, as well as the amount of energy consumed. The set of the transmitting power between the satellites and the earth can be expressed as
Figure BDA0002767545850000171
The received power set may be expressed as
Figure BDA0002767545850000172
Where δ represents the step size, 0 represents no data being received or transmitted, PMAXIndicating the maximum value of the received or transmitted power. Considering the difference of the distance between the remote sensing satellite and the ground station and between the remote sensing satellite and the relay satellite, the channel condition and the like, the maximum value of the inter-satellite transmission power is generally larger than that of the inter-satellite transmission power.
(1e) Defining a reward R: indicating the reward that the agent gets after pushing it to transition from one state to another by performing some action. The reward is a specific numerical value which can be set according to the scene. Considering that the task of the remote sensing satellite is data transmission, the invention transmits the data volume of the remote sensing satellite at one time
Figure BDA0002767545850000173
(in GB) as reward Ri. The larger the amount of data transmitted, the larger the reward, otherwise, the smaller the reward.
(1f) Defining an action cost function
Figure BDA0002767545850000174
Indicating that the agent is guided by strategy pi in state SiNext, action P is performediThereafter, a expectation of return is obtained. The action value evaluates the effect of the action on all subsequent times, as follows:
Figure BDA0002767545850000175
wherein, the strategy pi is the basis of the agent selecting action. Since there is no final status for successive tasks, it is necessary to introduce a discount factor γ on the basis of the jackpot to converge the return.
And 2, pre-generating data in a topological period, wherein the data comprises link on-off, link connection time and time of each time slot of the remote sensing satellite in the sun as environmental data parameters of the method.
(2a) Under a group of remote sensing satellite orbit, relay satellite orbit and ground station position parameters, the network is simulated by using STK, and link connection, position and environment information in a topological period T (taking a time slot as a unit) are derived, wherein the link connection, position and environment information comprises the starting time, the ending time and the duration of the links established between the remote sensing satellite and all ground stations and relay satellites, the longitude, the latitude and the altitude information of the remote sensing satellite, and the time length of the remote sensing satellite in the sun.
(2b) And re-time-slotting the data by taking tau as a time slot length. Counting the time of each time slot remote sensing satellite on the sun, the on-off state of a link and the duration tauihAnd the position information of the remote sensing satellite at the starting moment of each time slot.
Step 3, initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma.
Step 4, observe the state SiSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme for the current time slot.
Referring to FIG. 3, FIG. 3 is a sub-flow diagram of the present invention for selecting actions according to the ε -greedy policy; the specific implementation steps for selecting actions according to the epsilon-greedy strategy are as follows:
(4a) according to state SiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts are as follows:
Figure BDA0002767545850000181
wherein, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiThe link transmission rate of the following can be calculated as follows:
Figure BDA0002767545850000182
here, Bc(Hz) represents the channel bandwidth.
(4b) According to each set of state and action pair (S)i,Pi) Calculating a six-dimensional feature vector fi(Si,Pi). The feature vector is a function of the state and the action, and represents the quality of the action executed based on the current state under different indexes by specific numerical values not exceeding 1, and specifically there are the following six-dimensional investigation.
Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f1(Si,Pi) Is represented as follows:
Figure BDA0002767545850000183
wherein,
Figure BDA0002767545850000184
representing the energy consumption of the current time slot.
And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f2(Si,Pi) Is represented as follows:
Figure BDA0002767545850000185
wherein,
Figure BDA0002767545850000191
representing the amount of data received by the remote sensing satellite at the ith time slot,
Figure BDA0002767545850000192
DRmaxindicating the maximum amount of received data.
And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Figure BDA0002767545850000193
Wherein,
Figure BDA0002767545850000194
the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four categories that can be classified according to the link switching situation.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Figure BDA0002767545850000195
Figure BDA0002767545850000196
Figure BDA0002767545850000197
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,
Figure BDA0002767545850000198
is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
Figure BDA0002767545850000199
wherein,
Figure BDA00027675458500001910
representing the maximum value of transmit power within the current feasible action set,
Figure BDA00027675458500001911
denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slots
Figure BDA00027675458500001912
Can be expressed as follows:
Figure BDA00027675458500001913
wherein,
Figure BDA00027675458500001914
indicates that the current time is
Figure BDA00027675458500001915
In the power allocation scheme of (3), after the transition to the next time, the maximum value of the transmission power in the feasible action set.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Figure BDA0002767545850000201
Wherein, B'i、D′iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
Figure BDA0002767545850000202
Figure BDA0002767545850000203
Figure BDA0002767545850000204
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (4) for the two cases
Figure BDA0002767545850000205
Is limited to obtain
Figure BDA0002767545850000206
Then, the total data transmission amount in two time slots
Figure BDA0002767545850000207
Can be expressed as follows:
Figure BDA0002767545850000208
and solving the optimal power distribution in the inter-satellite link to the satellite-ground link according to a Lagrange multiplier method.
Figure BDA0002767545850000209
General formula (5) -formula(6) In (1)
Figure BDA00027675458500002010
Is replaced by
Figure BDA00027675458500002011
Thus, B 'can be obtained'i、D′iAnd then obtain
Figure BDA00027675458500002012
The expression is as follows:
Figure BDA00027675458500002013
the optimal power is limited by the formula (4) to obtain
Figure BDA00027675458500002014
Then, the total data transmission amount in two time slots
Figure BDA00027675458500002015
Can be expressed as follows:
Figure BDA00027675458500002016
and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution scheme according to a Lagrange multiplier method.
Figure BDA00027675458500002017
The optimal power is limited by the formula (4) to obtain
Figure BDA00027675458500002018
Then, the total data transmission amount in two time slots
Figure BDA00027675458500002019
Can be expressed as follows:
Figure BDA00027675458500002020
changing PirAnd calculate its correspondences
Figure BDA0002767545850000211
By comparing correspondences
Figure BDA0002767545850000212
Finding an optimal set of power allocation schemes
Figure BDA0002767545850000213
So that
Figure BDA0002767545850000214
And when the maximum value is reached, the power distribution condition under the current link can be embodied.
And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. Its characteristic function f4(Si,Pi) Is represented as follows:
Figure BDA0002767545850000215
wherein,
Figure BDA0002767545850000216
representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite.
And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f5(Si,Pi) Is represented as follows:
Figure BDA0002767545850000217
and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f6(Si,Pi) Is represented as follows:
Figure BDA0002767545850000218
(4c) selecting action P according to epsilon-greedy policyi. The meaning of this strategy is that the remote sensing satellite selects the action that maximizes the approximate action cost function with a probability of 1-epsilon, and randomly selects the action with a probability of epsilon, as follows:
Figure BDA0002767545850000219
wherein,
Figure BDA00027675458500002110
representing the approximate cost function of motion in a linear fashion.
Step 5, using formula Ri=τih·Cit(Pit,Hi) Calculate the reward R that comes with this power allocation schemei. If I is less than I, the next step is carried out, otherwise, the step 10 is carried out.
Step 6, observing a pre-state S'iSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) And putting the network parameter into an experience memory for subsequent network parameter updating.
(6a) Observation of Pre-State S'i
Figure BDA00027675458500002111
H′iCan be estimated by taking the average value of historical time, state Bi、DiThe transfer process of (a) can be expressed as follows:
Figure BDA00027675458500002112
Figure BDA00027675458500002113
wherein, D'iRepresents data buffer Pre-State, B'iIndicating a battery charge pre-state.
(6b) According to
Figure BDA0002767545850000221
Rule updates ε'iWherein is epsilon'i=εi+1
(6c) Selecting an action P 'in a set of feasible actions using an epsilon-greedy policy'iIs the same as step 4, except that the parameter epsilon of the epsilon-greedy strategy in step (4)iNeed to be changed to epsilon'i
(6d) Sample (f)i,Pi,Ri,f′i,P′i) And putting the data into an experience memory.
Step 7, judging whether the value meets i% T copy0. If so, according to omegacritic=ωactorTo update omegacriticAnd carrying out the next step, otherwise, directly carrying out the next step.
Step 8, judging whether the value meets i% T train0. If so, update ω according to a gradient descent strategyactorAnd carrying out the next step, otherwise, directly carrying out the next step.
Referring to fig. 4, the specific implementation of this step is as follows:
(8a) in the experience memory, PiThe same samples are divided into a group, and the number of the samples in each group is recorded as MP
(8b) For each set of samples, a cost function Y (ω) is calculatedactor):
Figure BDA0002767545850000222
Figure BDA0002767545850000223
Figure BDA0002767545850000224
Wherein,
Figure BDA0002767545850000225
representing an approximate objective cost function.
(8c) In the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
Figure BDA0002767545850000226
wherein, the subscript n represents the time slot number corresponding to the sample number,
Figure BDA0002767545850000227
indicating the learning rate of the current time slot.
Step 9, updating the remote sensing satellite network parameters: si+1=S′i,Pi+1=P′iI +1, then go to step 5 and go to the next iteration.
Step 10, finishing the intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network, and outputting omegacriticFor resource joint scheduling.
The invention can better solve the optimization problem of the transmission performance of the remote sensing satellite in a time-varying and unmeasured network environment. The invention introduces the definition of weight vector based on reinforcement learning and establishes two independent networks with the same structure. The two networks continuously update respective parameters based on values in experience accumulation through algorithm guidance, and finally the converged parameters can provide guidance for multi-dimensional resource scheduling. The method is suitable for causal networks with unknown statistical characteristics, not only solves the problem of continuous network state space, but also avoids overestimation in parameter updating to a certain extent. In a word, the method can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provide guidance for the network specification and network optimization of the remote sensing satellite network.
The convergence and effectiveness of the present invention are explained below in conjunction with simulation experiments:
example 7
The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1-6,
simulation conditions and contents:
simulation software: STK, Matlab, Spyder;
simulation scene: the simulation scene of the invention consists of 3 relay satellites, 6 ground stations and 1 remote sensing satellite.
Simulation parameters: assume a set of inter-satellite transmit powers AtsgIs {0:1:80}, and the set of transmission powers { A } between the stars istssIs {0:1:70}, and a set of received powers { A }rIs {0:30:30 }. The satellite-ground link channel bandwidth is 250 MHz. Meanwhile, assuming that there is a fixed static power consumption of 10W per slot of the satellite, it is specified that if data is selected for reception, its reception rate is always 100 Mbit/s. In addition, relevant parameters τ of the learning process are set to 300(s), T to 288(slots), γ to 0.9, and Tcopy=3×T(slots),Ttrain=2×T(slots),I=10002×T(slots),Bmin=0.6×Bmax
Simulation content: using the simulation scenario, the simulation software, and the network topology shown in fig. 2, the convergence of the method of the present invention will be described first. And then the data buffer capacity is taken as a resource variable, and the effectiveness of the method provided by the invention is illustrated under the comparison of three other methods.
And (3) simulation result analysis:
referring to fig. 5, fig. 5 is a cost value graph obtained by simulation of the present invention, in fig. 5, the abscissa is an Actor network parameter update time slot, and the ordinate is a cost value, where a dotted line represents a cost value change of a satellite-to-ground link, a solid line represents a cost value change of an inter-satellite link, and fig. 5 takes the cost value as an index to measure an approximation degree between an approximate action cost function and an action cost function. As can be seen from fig. 5, the overall situation of convergence after the decrease is presented as the learning process advances, regardless of the cost values of the satellite-to-ground links or the cost values of the inter-satellite links. This is because the Actor network parameters of the satellite-to-ground link and the inter-satellite link are updated according to the gradient descent strategy, that is, the network parameters are always updated in the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is gradually reduced, and in addition, the generation of the sample is influenced by the epsilon-greedy strategy, so that the integrated body is shown as that the cost value gradually descends in the fluctuation. When the learning process reaches a certain time, the exploration rate and the learning rate reach smaller values, the updating change of the network parameters is not large, and the representation on the graph is the convergence of the cost value. The cost value curve graph obtained by simulation is consistent with theoretical analysis, and the convergence of the method is verified.
Example 8
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as the embodiments 1-6, and the simulation conditions and contents are the same as the embodiment 7
Referring to fig. 6, fig. 6 is a comparison graph of average period rewards obtained by simulation of the present invention, in fig. 6, the abscissa is the data buffer capacity, and the ordinate is the average period rewards of the remote sensing satellite network, wherein the solid dot line represents the intelligent resource joint scheduling method provided by the present invention, the solid dot line represents the greedy resource joint scheduling method, the solid square line represents the Q-learning resource joint scheduling method, and the solid triangular dot line represents the random resource joint scheduling method. Fig. 6 shows the effect of the variation of the data buffer margin on the network performance and the performance difference of the four methods under the four methods by using the average period transmission data amount as the performance index. In the figure, the method proposed by the present invention is the best overall performance, followed by the greedy method, the second best Q-learning method, and the worst performance is the random method. As can be seen from fig. 6, in the case of fixed resources in the other two dimensions, as the data buffer margin increases, the performance of the four methods shows a trend of first rising and then smoothing, which is the performance saturation caused by the resource limitation in the other dimensions. It is worth mentioning that the proposed method and the greedy method perform close to each other when the data buffer margin is small. This is because when the data buffer margin is small, the energy consumption requirement for transmitting the existing data is not large, and other resources are relatively abundant for the small data buffer margin, and the energy consumption requirement can be always satisfied, so that the method proposed by the present invention has a small significance for energy storage reuse, resulting in a performance similar to that of a greedy method.
In short, the invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the optimization problem of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing a remote sensing satellite network model with uncertain environment; generating data of an environmental parameter; initializing parameters required by the intelligent resource joint scheduling method; directing the satellite to perform power allocation; pre-transferring the state of the remote sensing satellite network; guiding the satellite to perform power pre-allocation; critic network parameter omegacriticUpdating and judging; actor network parameter ωactorUpdating and judging; updating the state, the action and the current time slot number of the remote sensing satellite network; obtaining a network parameter omega for guiding joint schedulingcriticAnd guidance is provided for multi-dimensional resource scheduling. The invention fully considers the task transmission characteristics of the remote sensing satellite and the particularity of the network environment of the remote sensing satellite and obtains the environmental data parameters under certain scale and parameters through software simulation. In addition, the invention defines a six-dimensional characteristic function based on reinforcement learning, and then linearly approximates the action value function by combining a weight vector. Two independent, structurally identical networks continuously update their respective parameters based on the values during experience accumulation. The method is suitable for the remote sensing satellite network with unknown statistical characteristics, not only solves the problem of spatial continuity of the network state, but also avoids over estimation in parameter updating to a certain extent. The invention can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provides guidance for the network specification and network optimization of the remote sensing satellite network.

Claims (5)

1. An intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving a high-complexity planning problem and continuously and infinitely solving a state space are avoided through reinforcement learning, and the method comprises the following steps:
(1) establishing a remote sensing satellite network model with uncertain environment: establishing a remote sensing satellite network model with uncertain environment, firstly determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station, and then defining a state set S, an action set A, a reward R and an action value function of the remote sensing satellite network
Figure FDA0002767545840000011
The state set S ═ { B × D × H × EHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energy
Figure FDA0002767545840000012
Four parts; according to the standards of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, a dynamic channel model of the satellite-to-ground and inter-satellite links is established, and a channel parameter H is obtained through simulationi(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energy
Figure FDA0002767545840000013
The action set A ═ { A ═ Ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed as
Figure FDA0002767545840000014
And
Figure FDA0002767545840000015
where δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,
Figure FDA0002767545840000016
if not, then,
Figure FDA0002767545840000017
the reward R is expressed by the data volume sent by the satellite at the initial time of the time slot; the action cost function
Figure FDA0002767545840000018
The meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; completing the establishment of a remote sensing satellite network model with uncertain environment;
(2) generating data of the environmental parameters: deriving original data of environmental parameters in a topological period through an STK software simulation remote sensing satellite network model, and processing the original data through MATLAB software to obtain link on-off, link connection duration, remote sensing satellite position and duration of each time slot in the sun, wherein the data are used as environmental parameter data of an intelligent resource joint scheduling method;
(3) initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma;
(4) and guiding the satellite to perform power distribution: observing the state SiBased on each feasible action, through defined six-dimensional characteristics reflecting working characteristics and environmental influence of the remote sensing satelliteFunction, extracting feature vector f of state and action pairi(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;
(5) pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if the I is I, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed;
(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyi' as the power allocation scheme for next slot pre-selection, and samples (f)i,Pi,Ri,f′i,P′i) Putting the network parameter into an experience memory for subsequent network parameter updating;
(7) critic network parameter omegacriticUpdating and judging: update interval T for current time slot number i and Critic network parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticCarrying out the next step, otherwise, directly carrying out the next step;
(8) actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorCarrying out the next step, otherwise, directly carrying out the next step;
(9) updating state, action and time of remote sensing satellite networkNumber of preceding slots: si+1=S′i,Pi+1=Pi', i ═ i +1, one iteration is completed, and then go to step (5);
(10) obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticThe intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished; in practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0).
2. The method for jointly scheduling intelligent resources under the uncertain remote sensing satellite network according to claim 1, wherein the step (4) of directing the satellite to perform power distribution is to select an action P in a feasible action set by using an epsilon-greedy strategyiThe method comprises the following steps:
(4a) computing a set of feasible actions { Af}i: from the state S of the remote sensing satellite networkiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts are as follows:
Figure FDA0002767545840000031
wherein, tauihIndicating the duration of the link at the start of the ith time slot, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiThe link transmission rate of the following can be calculated as follows:
Figure FDA0002767545840000032
here, Bc(Hz) represents the channel bandwidth;
(4b) meterComputing six-dimensional feature vector fi(Si,Pi): for each set of state and action pairs (S)i,Pi) Computing the feature vector f by six feature functionsi(Si,Pi) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1iPerformed action PiIs a function of state and action;
(4c) selecting action P according to epsilon-greedy policyi: according to the epsilon-greedy strategy, the epsilon-greedy strategy means that the remote sensing satellite has a feasible action set { A }f}iIn which either the action with the highest value of the approximate action is chosen with a probability of 1-epsilon or the action is chosen randomly with a probability of epsilon, only one result being the action P chosen according to the epsilon-greedy strategyiExpressed as follows:
Figure FDA0002767545840000033
wherein,
Figure FDA0002767545840000034
indicates that the ith time slot is based on a state and action pair (S)i,Pi) And Actor network parameter ωactorThe approximate cost function of the motion of (c),
Figure FDA0002767545840000035
3. the intelligent resource joint scheduling method of claim 1, wherein said calculating six-dimensional feature vector f of step (4b)i(Si,Pi) Specifically, the following six-dimensional considerations are considered.
(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f1(Si,Pi) Is represented as follows:
Figure FDA0002767545840000041
wherein,
Figure FDA0002767545840000042
representing the energy consumption of the current time slot.
(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f2(Si,Pi) Is represented as follows:
Figure FDA0002767545840000043
wherein,
Figure FDA0002767545840000044
indicating the amount of data received by the satellite at the ith time slot,
Figure FDA0002767545840000045
is the amount of data transmitted by the satellite at the ith time slot,
Figure FDA0002767545840000046
DRmaxrepresents the maximum amount of received data;
(4b3) and calculating in a third dimension: the third dimension indicates whether the action is consistent with an optimal power allocation. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Figure FDA0002767545840000047
Wherein,
Figure FDA0002767545840000048
the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four categories that can be classified according to the link switching situation.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Figure FDA0002767545840000049
Figure FDA00027675458400000410
Figure FDA0002767545840000051
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,
Figure FDA0002767545840000052
is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
Figure FDA0002767545840000053
wherein,
Figure FDA0002767545840000054
indicating that the current actionableThe maximum value of the transmit power in the set is made,
Figure FDA0002767545840000055
denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slots
Figure FDA0002767545840000056
Can be expressed as follows:
Figure FDA0002767545840000057
wherein,
Figure FDA0002767545840000058
indicates that the current time is
Figure FDA0002767545840000059
In the power allocation method of (3), the maximum value of the transmission power in the feasible action set is obtained after the next time is shifted.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Figure FDA00027675458400000510
Wherein, B'i、D'iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
Figure FDA00027675458400000511
Figure FDA00027675458400000512
Figure FDA00027675458400000513
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next time is determined in the power allocation method of (1). Reuse of formula (1) for further two cases
Figure FDA00027675458400000514
Is limited to obtain
Figure FDA00027675458400000515
Then, the total data transmission amount in two time slots
Figure FDA00027675458400000516
The expression is as follows:
Figure FDA0002767545840000061
and solving the optimal power distribution in the inter-satellite link to the satellite-ground link according to a Lagrange multiplier method.
Figure FDA0002767545840000062
In the formula (2) -formula (3)
Figure FDA0002767545840000063
Is replaced by
Figure FDA0002767545840000064
Thus, B 'can be obtained'i、D'iAnd then obtain
Figure FDA0002767545840000065
The expression is as follows:
Figure FDA0002767545840000066
the optimal power is limited by the formula (1) to obtain
Figure FDA0002767545840000067
Then, the total data transmission amount in two time slots
Figure FDA0002767545840000068
The expression is as follows:
Figure FDA0002767545840000069
in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution according to a Lagrange multiplier method;
Figure FDA00027675458400000610
the optimal power is limited by the formula (1) to obtain
Figure FDA00027675458400000611
Then, the total data transmission amount in two time slots
Figure FDA00027675458400000612
Can be expressed as follows:
Figure FDA00027675458400000613
in summary, P is changedirAnd calculate its correspondences
Figure FDA00027675458400000614
By comparing correspondences
Figure FDA00027675458400000615
Finding an optimal set of power allocations
Figure FDA00027675458400000616
So that
Figure FDA00027675458400000617
When the maximum value is reached, the power distribution condition under the current link can be embodied;
(4b4) and calculating the fourth dimension: the fourth dimension represents whether the network resource can be fully utilized or not when the energy is abundant, so as to avoid energy waste, and the characteristic function f of the fourth dimension4(Si,Pi) Is represented as follows:
Figure FDA00027675458400000618
wherein,
Figure FDA00027675458400000619
representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite;
(4b5) and calculating a fifth dimension: in the second dimension, the characteristic value corresponding to data overflow is defined as 0, which is because the actually received data amount is not matched with the paid energy, resulting in energy waste; the fifth dimension is a supplement of the second dimension, which means that when the energy is abundant, the energy waste caused by data overflow is negligible; its characteristic function f5(Si,Pi) Is represented as follows:
Figure FDA0002767545840000071
(4b6) and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f6(Si,Pi) Is represented as follows:
Figure FDA0002767545840000072
f6(Si,Pi) A sixth characteristic function for the ith time slot;
the calculation result of the six feature functions is the six-dimensional feature vector for guiding the satellite power distribution, and is used as the action P selected in the step (4c)iIs an important basis.
4. The method for intelligent resource joint scheduling according to claim 1, wherein said step (6) uses epsilon-greedy policy as in step (4), and in step (6), participates in updated exploration rate epsilon'iAccording to epsilon'i=εi+1Is updated, and
Figure FDA0002767545840000073
5. the intelligent resource joint scheduling method of claim 1, wherein said step (8) of updating ω according to a gradient descent strategyactorThe method comprises the following steps:
(8a) sampling: store P in experience memoryiSame sample (f)i,Pi,Ri,f′i,Pi') are divided into one group, and the number of samples in each group is recorded as MP
(8b) Calculating a cost function Y (omega) of each group of samplesactor): for each set of samples, a cost function Y (ω) is calculatedactor):
Figure FDA0002767545840000074
Figure FDA0002767545840000075
Figure FDA0002767545840000076
Wherein,
Figure FDA0002767545840000077
representing an approximate objective cost function;
(8c) updating omegaactor: in the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
Figure FDA0002767545840000081
ωactor=ωactor-Δωactor
wherein, the subscript n represents the time slot number corresponding to the sample number,
Figure FDA0002767545840000082
the learning rate of the current time slot is represented, and the operator network parameter omega is completed through assignment operationactorAnd (4) updating.
CN202011251365.3A 2020-11-09 2020-11-09 Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network Active CN112422171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011251365.3A CN112422171B (en) 2020-11-09 2020-11-09 Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011251365.3A CN112422171B (en) 2020-11-09 2020-11-09 Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network

Publications (2)

Publication Number Publication Date
CN112422171A true CN112422171A (en) 2021-02-26
CN112422171B CN112422171B (en) 2021-09-03

Family

ID=74781837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011251365.3A Active CN112422171B (en) 2020-11-09 2020-11-09 Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network

Country Status (1)

Country Link
CN (1) CN112422171B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113133078A (en) * 2021-04-19 2021-07-16 西安电子科技大学 Light-weight inter-satellite switching device and method for giant low-orbit satellite network
CN113378366A (en) * 2021-06-03 2021-09-10 北京建筑大学 Guidance information layout method for guidance sign of comprehensive passenger transport hub
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN114726431A (en) * 2022-03-02 2022-07-08 武汉大学 Low-earth-orbit satellite constellation-oriented beam hopping multiple access method
CN116436510A (en) * 2023-04-28 2023-07-14 银河航天(成都)通信有限公司 Method, device and storage medium for transmitting application data by using relay satellite
CN117459122A (en) * 2023-11-09 2024-01-26 成都本原星通科技有限公司 Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643860A (en) * 2002-03-28 2005-07-20 马科尼英国知识产权有限公司 Method and arrangement for dinamic allocation of network resources
US20070021998A1 (en) * 2005-06-27 2007-01-25 Road Ltd. Resource scheduling method and system
CN104573856A (en) * 2014-12-25 2015-04-29 北京理工大学 Spacecraft resource constraint processing method based on time topological sorting
CN106060858A (en) * 2016-05-18 2016-10-26 苏州大学 Method and apparatus for software defining satellite networking based on OpenFlow extended protocol
CN106100719A (en) * 2016-06-06 2016-11-09 西安电子科技大学 Moonlet network efficient resource dispatching method based on earth observation task
CN106230497A (en) * 2016-09-27 2016-12-14 中国科学院空间应用工程与技术中心 A kind of Information Network resource bilayer dispatching method and system
CN110099388A (en) * 2019-03-21 2019-08-06 世讯卫星技术有限公司 A kind of satellite mobile communication method with the 5G network integration
US20200019435A1 (en) * 2018-07-13 2020-01-16 Raytheon Company Dynamic optimizing task scheduling

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643860A (en) * 2002-03-28 2005-07-20 马科尼英国知识产权有限公司 Method and arrangement for dinamic allocation of network resources
US20070021998A1 (en) * 2005-06-27 2007-01-25 Road Ltd. Resource scheduling method and system
CN104573856A (en) * 2014-12-25 2015-04-29 北京理工大学 Spacecraft resource constraint processing method based on time topological sorting
CN106060858A (en) * 2016-05-18 2016-10-26 苏州大学 Method and apparatus for software defining satellite networking based on OpenFlow extended protocol
CN106100719A (en) * 2016-06-06 2016-11-09 西安电子科技大学 Moonlet network efficient resource dispatching method based on earth observation task
CN106230497A (en) * 2016-09-27 2016-12-14 中国科学院空间应用工程与技术中心 A kind of Information Network resource bilayer dispatching method and system
US20200019435A1 (en) * 2018-07-13 2020-01-16 Raytheon Company Dynamic optimizing task scheduling
CN110099388A (en) * 2019-03-21 2019-08-06 世讯卫星技术有限公司 A kind of satellite mobile communication method with the 5G network integration

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YU WANG 等: "Joint Scheduling of Observation and Transmission in Earth Observation Satellite Networks", 《GLOBECOM 2017 - 2017 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 *
周笛 等: "巨型星座系统的网络运维与资源管控技术", 《天地一体化信息网络》 *
慈元卓 等: "不确定环境下多星联合观测调度问题研究", 《系统工程与电子技术》 *
李玉庆: "动态不确定环境下航天器观测调度问题研究", 《中国博士学位论文全文数据库》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113133078B (en) * 2021-04-19 2022-04-08 西安电子科技大学 Light-weight inter-satellite switching device and method for giant low-orbit satellite network
CN113133078A (en) * 2021-04-19 2021-07-16 西安电子科技大学 Light-weight inter-satellite switching device and method for giant low-orbit satellite network
CN113378366A (en) * 2021-06-03 2021-09-10 北京建筑大学 Guidance information layout method for guidance sign of comprehensive passenger transport hub
CN113378366B (en) * 2021-06-03 2023-08-18 北京建筑大学 Comprehensive passenger transport hub guiding identification guiding information layout method
CN113572517B (en) * 2021-07-30 2022-06-24 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN113613301B (en) * 2021-08-04 2022-05-13 北京航空航天大学 Air-ground integrated network intelligent switching method based on DQN
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN114726431A (en) * 2022-03-02 2022-07-08 武汉大学 Low-earth-orbit satellite constellation-oriented beam hopping multiple access method
CN114726431B (en) * 2022-03-02 2023-12-12 国家计算机网络与信息安全管理中心 Wave beam hopping multiple access method facing low orbit satellite constellation
CN116436510A (en) * 2023-04-28 2023-07-14 银河航天(成都)通信有限公司 Method, device and storage medium for transmitting application data by using relay satellite
CN116436510B (en) * 2023-04-28 2024-05-17 银河航天(成都)通信有限公司 Method, device and storage medium for transmitting application data by using relay satellite
CN117459122A (en) * 2023-11-09 2024-01-26 成都本原星通科技有限公司 Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life
CN117459122B (en) * 2023-11-09 2024-07-05 成都本原星通科技有限公司 Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life

Also Published As

Publication number Publication date
CN112422171B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN112422171B (en) Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network
Liu et al. Cooperative offloading and resource management for UAV-enabled mobile edge computing in power IoT system
Zhou et al. Machine learning-based resource allocation in satellite networks supporting internet of remote things
Gunduz et al. Designing intelligent energy harvesting communication systems
CN110427261A (en) A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree
CN110113190A (en) Time delay optimization method is unloaded in a kind of mobile edge calculations scene
CN114626306B (en) Method and system for guaranteeing freshness of regulation and control information of park distributed energy
CN102299854B (en) Opportunistic network environment-oriented multi-object routing decision making system
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN109451556A (en) The method to be charged based on UAV to wireless sense network
Hu et al. Edge intelligence for real-time data analytics in an IoT-based smart metering system
Zhao et al. Adaptive multi-UAV trajectory planning leveraging digital twin technology for urban IIoT applications
CN117459112A (en) Mobile edge caching method and equipment in LEO satellite network based on graph rolling network
CN109413746B (en) Optimized energy distribution method in communication system powered by hybrid energy
CN116566466A (en) Multi-target dynamic preference satellite-ground collaborative computing unloading method for low orbit satellite constellation
CN115912430A (en) Cloud-edge-cooperation-based large-scale energy storage power station resource allocation method and system
CN116431326A (en) Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning
Wang et al. Trajectory planning of UAV-enabled data uploading for large-scale dynamic networks: A trend prediction based learning approach
CN115412156A (en) Urban monitoring-oriented satellite energy-carrying Internet of things resource optimization allocation method
Zhao et al. Online Trajectory Optimization for Energy-Efficient Cellular-Connected UAVs With Map Reconstruction
Zhang et al. Collaborative Task Offloading Optimization for Satellite Mobile Edge Computing Using Multi-Agent Deep Reinforcement Learning
CN115361688B (en) Industrial wireless edge gateway optimization layout scheme based on machine learning
CN113840306B (en) Distributed wireless network access decision method based on network local information interaction
CN115629540A (en) Satellite Internet of things online resource joint allocation method based on meta reinforcement learning
Shen et al. Cost-effective task offloading and trajectory optimization in UAV assisted edge networks with DDPG

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant