CN112422171A - Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network - Google Patents
Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network Download PDFInfo
- Publication number
- CN112422171A CN112422171A CN202011251365.3A CN202011251365A CN112422171A CN 112422171 A CN112422171 A CN 112422171A CN 202011251365 A CN202011251365 A CN 202011251365A CN 112422171 A CN112422171 A CN 112422171A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- satellite
- network
- action
- sensing satellite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 143
- 230000009471 action Effects 0.000 claims abstract description 156
- 230000005540 biological transmission Effects 0.000 claims abstract description 63
- 239000013598 vector Substances 0.000 claims abstract description 38
- 230000007613 environmental effect Effects 0.000 claims abstract description 32
- 230000006870 function Effects 0.000 claims description 75
- 238000004088 simulation Methods 0.000 claims description 18
- 230000003068 static effect Effects 0.000 claims description 18
- 238000005265 energy consumption Methods 0.000 claims description 15
- 230000002787 reinforcement Effects 0.000 claims description 11
- 239000002699 waste material Substances 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000010521 absorption reaction Methods 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 6
- 238000002347 injection Methods 0.000 claims description 3
- 239000007924 injection Substances 0.000 claims description 3
- 238000005381 potential energy Methods 0.000 claims description 3
- 238000013468 resource allocation Methods 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 239000013589 supplement Substances 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 10
- 238000004422 calculation algorithm Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 20
- 238000004891 communication Methods 0.000 description 9
- 238000011160 research Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 230000007704 transition Effects 0.000 description 7
- 230000001364 causal effect Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000007787 solid Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18513—Transmission in a satellite or space-based system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18519—Operations control, administration or maintenance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0446—Resources in time domain, e.g. slots or frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Astronomy & Astrophysics (AREA)
- Aviation & Aerospace Engineering (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Radio Relay Systems (AREA)
Abstract
The invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the problem of optimization of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing an uncertain environment remote sensing satellite network model; generating environmental parameter data; initializing required parameters; directing satellite power allocation; pre-transferring the network state; guiding satellite power pre-allocation; updating and judging network parameters; updating the network state, the action and the current time slot number; and obtaining the network parameters to provide guidance for multi-dimensional resource scheduling. The invention obtains the network environment parameter data under certain scale and parameters through software simulation, defines a six-dimensional characteristic function, and linearly approximates the action cost function by combining a weight vector. The invention solves the problem of continuous network state space, avoids over-estimation of parameter updating, adapts to the remote sensing satellite network in the dynamic random variation environment in the future and provides guidance for network gauge and network optimization.
Description
Technical Field
The invention relates to the technical field of satellite communication, mainly relates to remote sensing satellite network resource joint scheduling, in particular to an intelligent dynamic resource joint scheduling method under an environment-indeterminate remote sensing satellite network, and can be used for the remote sensing satellite network in a time-varying and unpredictable environment.
Background
Compared with a land communication network, the satellite network has the advantages of long communication distance, high communication quality, no geographic condition limitation on communication service, no influence of natural disasters and the like. In recent years, with the rapid increase of the demand of people on high-timeliness, high-precision and high-utility remote sensing data, the country continuously increases the investment and construction of satellite remote sensing services, and the remote sensing satellite network increasingly shows the value of social and economic development, close connection with the national economic development strategy and huge development potential. At present, the method is widely applied to various aspects such as agricultural assessment, ecological environment monitoring, weather forecasting, disaster prevention and reduction and the like, and is closely related to our lives. The remote sensing satellite network consists of a remote sensing satellite, a relay satellite and a ground station. The remote sensing data is acquired by a remote sensing satellite and is directly or indirectly transmitted to the ground station with the assistance of a relay satellite. The resource is the basis of remote sensing satellite network operation, and the resource scheduling directly influences the transmission performance, so the method has great significance for researching the multidimensional resource joint scheduling method.
Due to the orbital motion characteristic of the satellite, the remote sensing satellite periodically appears on the sun and the shade of the earth. In the sun of the earth, solar energy is supplied, and the supply amount is random and unpredictable due to the influence of solar energy sail loss, ion radiation and the like. When the earth is cloudy, no solar energy is supplied, and energy can be supplied only through a satellite-borne battery. Meanwhile, the remote sensing satellite has conventional static energy consumption and dynamic energy consumption (including task acquisition and transmission) guided by resource scheduling in order to maintain the normal operation of the system and execute tasks during the normal working period of the remote sensing satellite. In addition, due to different orbital characteristics, the topological structure of the remote sensing satellite network is another time-varying factor, which seriously affects the transmission of tasks. In conclusion, how to design an efficient multidimensional resource joint scheduling method to optimize the long-term performance of a network is an important problem to be researched urgently.
There are many researches on resource scheduling methods, which can be mainly classified into static and dynamic types. The static algorithm requires that all unknown environmental data including solar energy arrival, channel conditions and the like within the service time are obtained before task transmission, and then the resource scheduling problem is solved based on global planning. Dynamic algorithms do not require predictive data and can be further divided into two categories depending on whether statistical features are relied upon. The first one includes BIBM algorithm that is published by authors such as D.Zhou on IEEE Wireless Communications Letters and proposed in "Session QoS and software Service Lifetime handoff in Remote Sensing software Networks", and the algorithm combines three aspects of task acquisition, processing and sending, and solves the problem of resource scheduling based on a state transition probability model. The second includes the Approximate SARSA algorithm, which is mentioned in "information learning for energy transforming point-to-point Communications" published by authors of A.Ortiz et al in 2016IEEE International Conference on Communications, Kuala lumuru, and gets the optimal resource scheduling policy only through causal data without the limitation of knowing statistical characteristics.
The two methods mentioned above basically cover the current research situation of the resource scheduling method. The static algorithm is too ideal, and the non-causality of the static algorithm cannot be applied to a remote sensing satellite network. The first dynamic algorithm is equally inapplicable since the remote sensing satellite network does not have fixed, deterministic statistical features. The second dynamic algorithm can be used for solving the resource scheduling problem of the remote sensing satellite network, but the current research mostly focuses on a common energy collection system, and does not consider the task flow of the remote sensing satellite network and the particularity of the environment where the remote sensing satellite network is located.
Disclosure of Invention
The invention aims to design an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where an environment uncertain remote sensing satellite network is located and the resource scheduling scene thereof, aiming at the defects and limitations of the prior art.
The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving the planning problem with high complexity and the continuous and infinite state space are avoided through reinforcement learning, and the method comprises the following steps:
(1) establishing a remote sensing satellite network model with uncertain environment: establishing a remote sensing satellite network model with uncertain environment, firstly determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station, and then defining a state set S, an action set A, a reward R and an action value function of the remote sensing satellite networkThe state set S ═ { B × D × H × EHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energyFour parts; according to ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, establishing dynamic channel model of satellite-to-ground and inter-satellite link, and obtaining channel parameter H by simulationi(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energyThe action set A ═ { A ═ Ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed asAndwhere δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,if not, then,the reward R is expressed by the data volume sent by the satellite at the initial time of the time slot; the action cost functionThe meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; completing the establishment of a remote sensing satellite network model with uncertain environment;
(2) generating data of the environmental parameters: deriving original data of environmental parameters in a topological period through an STK software simulation remote sensing satellite network model, and processing the original data through MATLAB software to obtain link on-off, link connection duration, remote sensing satellite position and duration of each time slot in the sun, wherein the data are used as environmental parameter data of an intelligent resource joint scheduling method;
(3) initializing intelligent resource federationParameters required by the scheduling method are as follows: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma;
(4) and guiding the satellite to perform power distribution: observing the state SiExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible actioni(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;
(5) pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if the I is I, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed;
(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) Putting the network parameter into an experience memory for subsequent network parameter updating;
(7) critic network parameter omegacriticUpdating and judging: for current time slot number i and Critic networkUpdate interval T of parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticCarrying out the next step, otherwise, directly carrying out the next step;
(8) actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorCarrying out the next step, otherwise, directly carrying out the next step;
(9) updating the state, the action and the current time slot number of the remote sensing satellite network: si+1=S′i,Pi+1=P′iI is i +1, completing one iteration, and then turning to the step (5);
(10) obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticThe intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished; in practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0).
Aiming at the remote sensing satellite network, on one hand, the invention comprehensively considers the task transmission flow of the remote sensing satellite network and the environmental characteristics of the remote sensing satellite network, and establishes a relatively comprehensive remote sensing satellite network model. On the other hand, in the design stage, the optimization problem is solved on the basis of reinforcement learning, the complexity is reduced, the problem of continuous state space of the remote sensing satellite network is solved by defining a characteristic function and a weight vector and utilizing a linear approximation mode, the optimal resource scheduling scheme is searched on the basis of accurately characterizing the state, and the accuracy of the result is improved.
The invention helps the remote sensing satellite balance the battery resource and data transmission in dynamic environment, ensures the remote sensing satellite network to efficiently transmit tasks, and improves the transmission performance.
Compared with the prior art, the invention has the following advantages:
the characteristics of satellite operation are reflected in resource scheduling: the invention fully considers the satellite attribute and the working characteristics in the design of the resource joint scheduling method, such as the static energy consumption required by various systems such as satellite-borne thermal control, satellite affair and the like to maintain the normal operation of the remote sensing satellite. The multi-dimensional resources of the satellite are considered jointly, the resource scheduling characteristics of acquisition and transmission of the joint tasks of the remote sensing satellite are reflected, and the optimal resource scheduling scheme is determined based on the resource scheduling characteristics.
The scheduling method is closer to a remote sensing satellite network: according to the invention, a remote sensing satellite network model is built, environmental data and position data of the remote sensing satellite network running in a topological period with determined parameters and scale are obtained through STK software simulation, and MATLAB is used for processing the original data to obtain time-slotted environmental and position parameters, so that the scene of the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is more practical.
Solving a complex constraint planning problem based on reinforcement learning: the method utilizes the reinforcement learning idea to ensure that the solution of the multidimensional resource joint scheduling problem is independent of any non-causal data and statistical characteristics; aiming at the problem of continuous and infinite state space, the invention provides a six-dimensional characteristic vector reflecting the environmental characteristics of the remote sensing satellite network and the task transmission characteristics of the remote sensing satellite in the design of a resource joint scheduling method, maps the state and action pairs to the six-dimensional characteristic vector, and evaluates the quality of the action in a linear approximation mode by combining a weight vector, thereby solving the problem of infinite state space of the remote sensing satellite network and avoiding the storage problem caused by state discretization processing. In addition, the invention constructs two independent networks with the same structure and different parameters, thereby avoiding the problem of over-estimation during parameter updating to a certain extent.
Drawings
FIG. 1 is an overall flow diagram of an implementation of the present invention;
FIG. 2 is a schematic diagram of a remote sensing satellite network model in the present invention;
FIG. 3 is a sub-flow diagram of the present invention for selecting actions based on the ε -greedy policy;
FIG. 4 is a sub-flow diagram of the gradient descent strategy of the present invention;
FIG. 5 is a graph of cost values in the present invention;
FIG. 6 is a comparison graph of average periodic rewards in the present invention.
The invention is described in detail below with reference to the attached drawings and examples
Detailed Description
Example 1
The particularity and the dynamic variability of the remote sensing satellite network environment and the diversity of the energy consumption of the remote sensing satellite enable the remote sensing satellite network to be different from other communication networks. Research on remote sensing satellite network resource scheduling is numerous, and environmental data can be classified into static and dynamic according to the need of predicting whether the environmental data is needed or not. Static algorithms are based on known conditions, meaning that the environment data at all times in the future needs to be known before the satellite starts a transmission task. Although the static algorithm improves the upper bound of the performance of the remote sensing satellite network, the application of the static algorithm is limited due to non-causality of the static algorithm because the static algorithm is over-ideal, so that the application scene is few, and the static algorithm cannot meet most scenes in actual life. The dynamic algorithm is based on an unknown environment, which means that the remote sensing satellite does not need to give any environmental data in advance, and the dynamic algorithm can be further divided into two algorithms. The first category of dynamic methods refers to dynamic programming, which can be used to solve such problems when the statistical characteristics of the data, such as state transition probabilities, are known, based on a Markov decision process model. However, the operation complexity of the dynamic programming method is greatly increased along with the expansion of the problem scale, and serious calculation burden is brought to low-power equipment; meanwhile, not all processes have statistical characteristics, and the statistical characteristics may change with conditions and time, so that the method still has disadvantages. The second dynamic method is based on reinforcement learning and does not need series conditions such as environmental data or state transition models and the like as a premise, which means that the environmental data at each moment can be obtained only when the moment arrives, and the method is more suitable for the actual situation of the remote sensing satellite network. However, based on the research of the method, the data receiving process of the remote sensing satellite network is omitted, or the particularity of the environment where the remote sensing satellite network is located, such as the time-varying property of channel conditions and the characteristic that the remote sensing satellite operates under the alternate sunny and shady surfaces, is omitted. From the research situation, the characteristics above the remote sensing satellite network are not comprehensively considered in the current research, and the multi-dimensional resource joint scheduling problem of the remote sensing satellite network is yet to be further researched.
Aiming at the current situation, the invention designs an intelligent multi-dimensional resource joint scheduling method which does not depend on the determined statistical characteristics and is suitable for the environment where the remote sensing satellite network with uncertain environment is located and the resource scheduling scene thereof through research and experiments.
The invention relates to an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, wherein an established network model is suitable for the environment where the remote sensing satellite network is located and a resource scheduling scene of the remote sensing satellite network, and the problem of directly solving a high-complexity planning problem is avoided through reinforcement learning, and the method comprises the following steps of:
(1) establishing a remote sensing satellite network model with uncertain environment: a remote sensing satellite network model with uncertain environment is established, and the remote sensing satellite network mainly comprises a remote sensing satellite for acquiring and transmitting data and a ground station for receiving the data. Firstly, determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station. Then defining a state set S, an action set A, an award R and an action value function of the remote sensing satellite networkState set S, S ═ { B × D × H × E in the present inventionHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energyAnd fourthly, the method comprises the following steps. Wherein the battery capacity has an upper limit BmaxAnd a lower limit of BminThe data buffer also has an upper bound DmaxSatellite-earthThe channel and the inter-satellite channel have different channel models, channel parameters have different calculation modes, absorbed solar energy can change along with the alternation of the shade and the sun of the remote sensing satellite, and obviously, when the remote sensing satellite is positioned on the shade, no solar energy is supplied. The invention establishes a dynamic channel model of a satellite-to-ground link and an inter-satellite link according to the standards of the recommendation of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839, and obtains a channel parameter H through simulationi. The invention considers the orbit characteristic of satellite operation, establishes a dynamic energy collection model, and simulates to obtain the absorbed solar energyAction set a ═ ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed asAndwhere δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,if not, then,the reward R is expressed in terms of the amount of data transmitted by the satellite at the initial time of the time slot. Action cost function in the inventionThe meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; and finishing the establishment of the remote sensing satellite network model with uncertain environment. In the step, the method considers the process of acquiring data by the remote sensing satellite, and embodies the resource scheduling characteristic of the joint planning of the acquired and transmitted data by the remote sensing satellite network. In addition, the bookThe method establishes two channel models of the uncertain environment in which the remote sensing satellite network is positioned, considers the characteristic of energy supply, and is more suitable for the remote sensing satellite network scene.
(2) Generating data of the environmental parameters: and (3) deriving original data of environmental parameters in a topological period by simulating a remote sensing satellite network model through STK software, wherein the original data comprises the initial time, the termination time and the duration of links established between the remote sensing satellite and all ground stations and relay satellites, longitude, latitude and height information of the remote sensing satellite and the duration of the remote sensing satellite on the sun. And processing the original data through MATLAB software, carrying out time slot processing on the data again, and obtaining the duration of each time slot of the remote sensing satellite in the sun, the link on-off state and the link duration again by taking tau as the unit time slot length, wherein the data is used as environmental parameter data of the intelligent resource joint scheduling method. In the step, the remote sensing satellite network operation scene constructed by the STK software simulation is utilized, the original environment parameter data is obtained, and the MATLAB software is combined to process the original environment parameter data, so that the simulation result of the method provided by the invention is more accurate.
(3) Initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma.
(4) And guiding the satellite to perform power distribution: observing the state SiExtracting a characteristic vector f of a state and action pair through a defined six-dimensional characteristic function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible actioni(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAnd as a power allocation scheme of the current time slot, the satellite is guided to perform power allocation. In the step, the six-dimensional characteristic function reflects the characteristic of joint scheduling of the multi-dimensional resources of the remote sensing satellite network from six angles respectively, is used for extracting characteristic vectors of state and action pairs and is used for approximating a cost function. The use of the epsilon-greedy strategy avoids the resource scheduling method from falling into a locally optimal situation.
(5) Pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if so, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed. Since the method provided by the invention can be converged after a certain number of iterations, it is specified that if the current time slot number I is equal to the training total time slot number I, the intelligent resource joint scheduling method is ended.
(6) And guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) And putting the network parameter into an experience memory for subsequent network parameter updating. The power pre-allocation in this step refers to allocating power when the current time slot i is the starting time of the next time slot, and is still within the time slot i, so the power is called pre-allocation, and the mark 'represents a pre-variable f'i,P′i,S′i. Considering the limited capacity of the empirical memory, when the number of samples reaches the upper limit of the capacity, the storage of the subsequent samples is always performed according to the rule that the new samples replace the old samples.
(7) Critic network parameter omegacriticUpdating and judging: update interval T for current time slot number i and Critic network parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticAnd carrying out the next step, otherwise, directly carrying out the next step. Critic network parameter omegacriticAnd participating in the calculation of the approximate target action value, wherein the parameter is also an output parameter of the method and is used for guiding the joint scheduling of the multidimensional resource. Wherein, Critic network parameter omegacriticIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omegacriticAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground linkcriticω of inter-satellite linkscriticWith a different number of rows.
(8) Actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorAnd carrying out the next step, otherwise, directly carrying out the next step. Actor network parameter ωactorParticipate in the selection of the power allocation scheme. Wherein, the Actor network parameter ωactorIs a matrix with the number of rows equal to the number of elements in the action set a and the number of columns equal to the number of eigenfunctions. The satellite-to-ground link and the inter-satellite link respectively have an omegaactorAnd the selection is carried out according to the specific link condition in the calculation. Since the satellite-to-ground link and the inter-satellite link have different sets of transmit actions, ω of the satellite-to-ground linkactorω of inter-satellite linksactorWith a different number of rows.
(9) Updating the state, the action and the current iteration number of the remote sensing satellite network: si+1=S′i,Pi+1=P′iCompleting one iteration, and then turning to the step (5) to perform remote sensing satellite network state pre-transition; in step (5), the result is output or a new iteration is started by judging, or finishing the iteration. In this step, the new state Si+1And new actionsPi+1Namely the pre-state S 'in the steps (5) and (6)'iAnd Pre-motion P'i。
(10) Obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticAnd the intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished. In practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0). In particular, at the start of each time slot during the operation of the remote sensing satellite, according to the current state SiExtracting the status and action pair (S) formed by the action and each action in the feasible action seti,Pi) In combination with ωcriticCalculating the approximate action value, and selecting the action P corresponding to the maximum approximate action valueiAs the best action, the action is executed, the next time slot is shifted, and the steps are repeated.
Aiming at the defects that the prior art does not comprehensively consider the remote sensing satellite network scene and the task transmission flow of the remote sensing satellite, the working characteristics and the environment characteristics of the remote sensing satellite are comprehensively considered, time-varying environment data and link information are obtained through a simulation network, reinforcement learning is used as an algorithm frame, a Markov decision process is used as a basic model, and the ideas of linear approximation and double networks are added, so that the whole technical scheme for realizing the intelligent resource joint scheduling of the remote sensing satellite under the remote sensing satellite network with uncertain environment is provided, and the problem of effectively utilizing the multidimensional resources of the remote sensing satellite in a joint manner to optimize the transmission performance of the remote sensing satellite network is solved. The idea of the invention is as follows: firstly, establishing a remote sensing satellite network model and defining a task transmission flow of a remote sensing satellite; secondly, the maximum network transmission data volume is taken as an objective function, a constraint function is listed according to the environment and the satellite attribute, and the resource joint scheduling problem is modeled as an optimization problem, but the optimization problem cannot be directly solved because non-causal data cannot be obtained in a remote sensing satellite network; and then, an intelligent resource joint scheduling method is provided, aiming at guiding the remote sensing satellite to realize the optimal intelligent resource joint scheduling in continuous learning under the control of the method only according to causal data. The invention leads the remote sensing satellite to continuously accumulate experience in trial and error from zero experience in a learning period and continuously update parameters based on values until convergence. And the convergence parameters output by the method are used as the basis of final power distribution to realize the multidimensional resource joint scheduling. In the early stage, the invention simulates a network to obtain the link connection and energy arrival conditions in a topological period as the environmental parameters of the method. In the method, at the starting moment of each time slot, the remote sensing satellite carries out power distribution according to an epsilon-greedy strategy. Specifically, the remote sensing satellite randomly selects a power allocation scheme with a probability of ε, and selects a power allocation scheme that maximizes the cost function of action with a probability of 1- ε. The remote sensing satellite state value is infinite, so the invention introduces the concept of weight vector, and the inner product is obtained by the weight vector and the feature vector obtained by feature extraction, and the inner product is linearly approximate to the action cost function. And determining and executing a power distribution scheme, wherein the remote sensing satellite can obtain a feedback reward to be used as the evaluation of the power distribution, and a state pre-transfer process is carried out. And selecting a power allocation scheme according to an epsilon-greedy strategy, executing the power allocation scheme at the starting moment of the next time slot, and continuously repeating the process in the advancing of the time slot, wherein each process can be regarded as one time slot iteration. And saving part of parameters in each time slot to an experience memory as a sample of a subsequent network parameter updating process. And updating the Actor network parameters at regular intervals according to a gradient descent strategy, wherein the updating of the parameters is to adjust the weight vector to update the weight vector towards the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is reduced. The criticic network parameters are copied from the Actor network, and updating is completed. In order to ensure the convergence of the method, both the exploration rate epsilon and the learning rate alpha are reduced along with the increase of the iteration number. The method can be understood that, as the number of iterations increases, the effectiveness of the multidimensional resource joint scheduling strategy under the guidance of the weight vector is higher and higher, and the promotion space is smaller and smaller. Simulation results show that the method can be converged after a certain number of iterations, and the performance is superior to that of other comparison methods under the same condition.
Example 2
The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1, and in order to avoid the resource scheduling method from falling into local optimization, an epsilon-greedy strategy is adopted in a learning stage. In the early stage of the learning stage, the remote sensing satellite is more inclined to explore, namely, a resource scheduling scheme which is not tried is adopted; in the later stage of the learning stage, the remote sensing satellite is more greedy, namely, the best remote sensing satellite is selected as a resource scheduling scheme from the existing experience.
The method for guiding the satellite to carry out power distribution in the step (4) selects an action P in the feasible action set by using an epsilon-greedy strategyiAs a combination of a pair of receiving and transmitting power, the remote sensing satellite receives and transmits corresponding data at the cost of energy consumption according to the power distribution scheme under the current environmental condition, and the method specifically comprises the following steps:
(4a) computing a set of feasible actions { Af}i: because the remote sensing satellite is constrained by the satellite-borne battery energy and the data buffer capacity, the remote sensing satellite must perform power distribution on the premise of meeting the resource constraint. Capacity threshold BminThe remote sensing satellite system has the advantage that the service life of the remote sensing satellite system is prevented from being exhausted due to too low battery energy of the remote sensing satellite. The invention thus provides that the battery capacity B of the remote sensing satellite is usediLower than Bmin+PconsAnd when the x tau is multiplied, the remote sensing satellite does not transmit or receive any data. From the state S of the remote sensing satellite networkiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts, the resource constraint relationship is specifically as follows:
wherein, tauihIndicating the duration of the link at the start of the ith time slot, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiLower chainThe path transmission rate can be calculated as follows:
here, Bc(Hz) represents the channel bandwidth.
(4b) Computing a six-dimensional feature vector fi(Si,Pi): for each set of state and action pairs (S)i,Pi) Computing the feature vector f by six feature functionsi(Si,Pi) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1iPerformed action PiIs a function of state and action.
(4c) Selecting action P according to epsilon-greedy policyi: according to the epsilon-greedy strategy, the epsilon-greedy strategy means that the remote sensing satellite has a feasible action set { A }f}iIn which either the action with the highest value of the approximate action is chosen with a probability of 1-epsilon or the action is chosen randomly with a probability of epsilon, only one result being the action P chosen according to the epsilon-greedy strategyiExpressed as follows:
wherein,indicates that the ith time slot is based on a state and action pair (S)i,Pi) And Actor network parameter ωactorThe approximate cost function of the motion of (c),here, the state space of the remote sensing satellite network is continuous and infinite by the definition of the state set S, and even if the state is discretized, the size of the state table is enormous in order to reduce distortion as much as possible, which makes storage difficult. Therefore, the invention adopts a linear approximation mode directlyPlanning is performed on successive states.
Example 3
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as that in the embodiment 1 and the step (4b) of calculating the six-dimensional feature vector fi(Si,Pi) Specifically, the following six-dimensional considerations are considered.
(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. In a resource-limited remote sensing satellite network, the supply of solar energy is precious, and the remote sensing satellite should fully utilize the acquired solar energy to realize the storage and transmission of data. Its characteristic function f1(Si,Pi) Is represented as follows:
wherein,representing the energy consumption of the current time slot. The expression of the dimension of the invention considers the characteristics of the environment where the remote sensing satellite network is located and the self attribute of the remote sensing satellite from the aspect of energy, not only considers the acquisition process of solar energy, but also considers the static energy consumption of each time slot of the remote sensing satellite and the upper limit of the satellite-borne battery capacity.
(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Data overflow means that the energy consumed to receive data does not match the expected received data, which results in a portion of the received energy being wasted without reaching its expected return. Its characteristic function f2(Si,Pi) Is represented as follows:
wherein,indicating the amount of data received by the satellite at the ith time slot,is the amount of data transmitted by the satellite at the ith time slot,DR maxindicating the maximum amount of received data. This dimension allows for the process of receiving data from a remote sensing satellite from the perspective of the data buffer.
(4b3) And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Wherein,the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four link switching situations of two consecutive time slots, which are: the satellite-to-ground link, the inter-satellite link to the inter-satellite link, the inter-satellite link to the satellite-to-ground link, and the satellite-to-ground link.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
wherein,representing the maximum value of transmit power within the current feasible action set,denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slotsCan be expressed as follows:
wherein,is shown asThe previous moment isIn the power allocation scheme of (3), after the transition to the next time, the maximum value of the transmission power in the feasible action set.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Wherein, B'i、D′iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (1) for further two casesIs limited to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
and in the inter-satellite link to the satellite-ground link, solving the optimal power distribution scheme according to a Lagrange multiplier method.
In the formula (2) -formula (3)Is replaced byThus, B 'can be obtained'i、D′iAnd then obtainThe expression is as follows:
the optimal power is limited by the formula (1) to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power scheme according to a Lagrange multiplier method.
The optimal power is limited by the formula (1) to obtain an optimal power schemeThen, the total data transmission amount in two time slotsCan be expressed as follows:
in summary, P is changedirAnd calculate its correspondencesBy comparing correspondencesFinding an optimal set of power allocation schemesSo thatThe maximum is achieved, and the effect of power distribution in two continuous time slots including the current time slot and the next time slot is reflected.
(4b4) And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. This means that when the energy supply is abundant, the remote sensing satellite should perform resource scheduling with the largest energy consumption to acquire more solar energy for storing energy in the subsequent time slot. Its characteristic function f4(Si,Pi) Is represented as follows:
wherein,representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite. The dimension embodies the characteristic that the capacity of the satellite-borne battery of the remote sensing satellite has an upper limit.
(4b5) And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f5(Si,Pi) Is represented as follows:
the dimension embodies the characteristic that the capacity of the remote sensing satellite data buffer area has an upper limit.
(4b6) And calculating the sixth dimension: the sixth dimension represents the received power allocation, and since the data memory has an upper limit of capacity, the greater the received power is not, the more data is stored. Therefore, the characteristic function f6(Si,Pi) The effectiveness of the received power allocation is reflected as follows:
f6(Si,Pi) Is the sixth characteristic function of the ith slot. This dimension represents the efficiency of the remote sensing satellite receiving data, i.e. whether the energy paid out matches the data actually stored in the data buffer.
The calculation result of the six feature functions is the six-dimensional feature vector for guiding the satellite power distribution, and is used as the action P selected in the step (4c)iIs an important basis.
Because the state space of the remote sensing satellite network is continuous and infinite, the state cost function cannot be directly solved. In order to obtain the approximate state cost function, the invention adopts a linear approximation mode to carry out dot product operation on the six-dimensional characteristic vector and the weight vector to obtain the approximate action value which is used as the basis for selecting the power distribution scheme. The six-dimensional feature vector is obtained through six feature functions, the feature functions are functions of states and actions, and the definition of the six-dimensional feature vector is closely related to the environment of the remote sensing satellite network and the attributes of task transmission characteristics and the like of the remote sensing satellite.
The method is based on reinforcement learning, and helps the remote sensing satellite to carry out multidimensional resource joint scheduling only under the support of causal data; the problem of infinite state is solved by defining a six-dimensional characteristic function and a weight vector through a linear approximation method; the problem of overestimation during parameter updating is avoided to a certain extent by constructing two independent networks with the same structure and different parameters.
Example 4
The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as that in the embodiment 1-3, and the epsilon-greedy strategy used in the step (6) is the same as that in the step (4). In contrast, the search rate of the strategy of ε -greedy in step (6) 'is ∈'iTo be changed, the exploration ratio of participation strategy is epsilon'iAccording to epsilon'i=εi+1Is updated, and
since step (6) is a pre-allocation of the next slot power, the search rate ε'iThe need for further reduction compared to step (4). The "learning" process continues to be conservative in the decline of the exploration rate, i.e., the power allocation scheme corresponding to the maximum approximate action price value is selected with a greater probability.
Example 5
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is as in embodiments 1-4, and the step (8) updates omega according to the gradient descent strategyactorThe process of updating the parameters is the process of continuously 'learning' and optimizing the weight vector. The method comprises the following steps:
(8a) sampling: store P in experience memoryiSame sample (f)i,Pi,Ri,f′i,P′i) Dividing into one group, and recording the number of samples in each group as MP. As the parameters of the satellite-ground link and the inter-satellite link are updated independently, the inter-satellite link and the inter-satellite link are respectively provided with an experience memory. In the respective experience memory, the respective Actor network parameters are sampled and updated by calculation.
(8b) Calculating a cost function Y (omega) of each group of samplesactor): for each set of samples, a cost function Y (ω) is calculatedactor):
(8c) updating omegaactor: in the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
wherein, the subscript n represents the time slot number corresponding to the sample number,the learning rate of the current time slot is represented, and the operator network parameter omega is completed through assignment operationactorAnd (4) updating.
The invention obtains the approximate action cost function in a linear approximation mode and is used for approximating the action cost function. Thus, the Actor network parameter ωactorIt is updated in "learning" in a direction that the error of the motion cost function and the approximate motion cost function decreases.
The following is a detailed example to further illustrate the invention
Example 6
The method for jointly scheduling the intelligent resources under the environment uncertain remote sensing satellite network is the same as the embodiment 1-5, referring to the figure 1, and comprises the following steps:
Referring to fig. 2, fig. 2 is a schematic diagram of a remote sensing satellite network model in the invention. The remote sensing satellite network mainly comprises a remote sensing satellite, a relay satellite and a ground station. The satellite-to-satellite link is established between the satellites, and the satellite-to-ground link is established between the satellites and the ground station. The inter-satellite link has unidirectional transmission from the remote sensing satellite to the relay satellite or bidirectional transmission between the relay satellites. The satellite-ground link only has one-way transmission from the remote sensing satellite and the relay satellite to the ground station. In order to complete continuous information transmission tasks, the remote sensing satellite, the relay satellite and the ground station need to be cooperated with each other. The concrete steps of the remote sensing satellite network modeling are realized as follows:
(1a) giving the scale of the remote sensing satellite network: the ground station, the remote sensing satellite and the relay satellite jointly form a remote sensing satellite network. The ground station is GS ═ GS1,GS2,...,GSJJ, where J represents the total number of ground stations. The ground station is used for receiving data transmitted from the remote sensing satellite and the relay satellite and is the destination of all data. Remote sensing satellite scale RSS ═ RSS1,RSS2,...,RSSKWhere K represents the total number of remote sensing satellites. The remote sensing satellite samples the environment information through the satellite-borne equipment and stores the environment information as data, and then transmits the data to the ground station or the relay satellite, so that the data is a remote sensing numberAccording to the starting point. Relay satellite scale RS ═ RS1,RS2,...,RSLWhere L represents the total number of relay satellites. The relay satellite can help the remote sensing satellite to store and transmit data.
(1b) Time discretization: for convenience of analysis, the present invention divides the continuous time into several time slots with the same time length, the time slot length is marked as tau, and the total time slot number of the network operation is assumed as I. The ith slot is denoted as sloti=[ti,ti+1]Wherein I is 0,100 denotes the operation start time, tIIndicating the end of the run time. The invention uses subscript i to represent the starting time t of variable in ith time slotiThe value of (c) is as follows.
(1c) Defining a state set S: the state set S is composed of multi-dimensional resources and mainly comprises a battery state BiRepresenting the remaining battery capacity in joules (J); data buffer status DiThe unit of the data storage of the data buffer area is bit (bit); channel parameter HiThe link data transmission capacity is embodied and can be obtained by sensing of a satellite-borne sensor; solar energy absorbable by remote sensing satelliteThe units are joules (J). Wherein HiAnddescribed is the environmental state, Bi、DiStates of remote sensing satellites are described.
(1d) Define action set a: it is expressed in watts (W) by power value. The receiving power, the transmitting power of the satellite-ground link and the transmitting power of the inter-satellite link are discrete finite sets, and each set has a fixed offset. The amount of power affects the amount of data received and transmitted, as well as the amount of energy consumed. The set of the transmitting power between the satellites and the earth can be expressed asThe received power set may be expressed asWhere δ represents the step size, 0 represents no data being received or transmitted, PMAXIndicating the maximum value of the received or transmitted power. Considering the difference of the distance between the remote sensing satellite and the ground station and between the remote sensing satellite and the relay satellite, the channel condition and the like, the maximum value of the inter-satellite transmission power is generally larger than that of the inter-satellite transmission power.
(1e) Defining a reward R: indicating the reward that the agent gets after pushing it to transition from one state to another by performing some action. The reward is a specific numerical value which can be set according to the scene. Considering that the task of the remote sensing satellite is data transmission, the invention transmits the data volume of the remote sensing satellite at one time(in GB) as reward Ri. The larger the amount of data transmitted, the larger the reward, otherwise, the smaller the reward.
(1f) Defining an action cost functionIndicating that the agent is guided by strategy pi in state SiNext, action P is performediThereafter, a expectation of return is obtained. The action value evaluates the effect of the action on all subsequent times, as follows:
wherein, the strategy pi is the basis of the agent selecting action. Since there is no final status for successive tasks, it is necessary to introduce a discount factor γ on the basis of the jackpot to converge the return.
And 2, pre-generating data in a topological period, wherein the data comprises link on-off, link connection time and time of each time slot of the remote sensing satellite in the sun as environmental data parameters of the method.
(2a) Under a group of remote sensing satellite orbit, relay satellite orbit and ground station position parameters, the network is simulated by using STK, and link connection, position and environment information in a topological period T (taking a time slot as a unit) are derived, wherein the link connection, position and environment information comprises the starting time, the ending time and the duration of the links established between the remote sensing satellite and all ground stations and relay satellites, the longitude, the latitude and the altitude information of the remote sensing satellite, and the time length of the remote sensing satellite in the sun.
(2b) And re-time-slotting the data by taking tau as a time slot length. Counting the time of each time slot remote sensing satellite on the sun, the on-off state of a link and the duration tauihAnd the position information of the remote sensing satellite at the starting moment of each time slot.
Step 3, initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma.
Step 4, observe the state SiSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme for the current time slot.
Referring to FIG. 3, FIG. 3 is a sub-flow diagram of the present invention for selecting actions according to the ε -greedy policy; the specific implementation steps for selecting actions according to the epsilon-greedy strategy are as follows:
(4a) according to state SiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts are as follows:
wherein, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiThe link transmission rate of the following can be calculated as follows:
here, Bc(Hz) represents the channel bandwidth.
(4b) According to each set of state and action pair (S)i,Pi) Calculating a six-dimensional feature vector fi(Si,Pi). The feature vector is a function of the state and the action, and represents the quality of the action executed based on the current state under different indexes by specific numerical values not exceeding 1, and specifically there are the following six-dimensional investigation.
Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f1(Si,Pi) Is represented as follows:
And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f2(Si,Pi) Is represented as follows:
wherein,representing the amount of data received by the remote sensing satellite at the ith time slot,DRmaxindicating the maximum amount of received data.
And calculating in a third dimension: the third dimension indicates whether the action is consistent with the optimal power allocation scheme. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Wherein,the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four categories that can be classified according to the link switching situation.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
wherein,representing the maximum value of transmit power within the current feasible action set,denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slotsCan be expressed as follows:
wherein,indicates that the current time isIn the power allocation scheme of (3), after the transition to the next time, the maximum value of the transmission power in the feasible action set.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Wherein, B'i、D′iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next moment under the power allocation scheme of (3). Reuse of formula (4) for the two casesIs limited to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
and solving the optimal power distribution in the inter-satellite link to the satellite-ground link according to a Lagrange multiplier method.
General formula (5) -formula(6) In (1)Is replaced byThus, B 'can be obtained'i、D′iAnd then obtainThe expression is as follows:
the optimal power is limited by the formula (4) to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
and in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution scheme according to a Lagrange multiplier method.
The optimal power is limited by the formula (4) to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
changing PirAnd calculate its correspondencesBy comparing correspondencesFinding an optimal set of power allocation schemesSo thatAnd when the maximum value is reached, the power distribution condition under the current link can be embodied.
And calculating the fourth dimension: the fourth dimension represents whether the network resources can be fully utilized or not when the energy is abundant, so that the energy waste is avoided. Its characteristic function f4(Si,Pi) Is represented as follows:
wherein,representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite.
And calculating a fifth dimension: in the second dimension, the eigenvalue corresponding to the data overflow is defined as 0, which is because the actual received data amount and the delivered energy do not match, resulting in waste of energy. The fifth dimension is complementary to the second dimension, indicating that when the power is abundant, the waste of power due to data overflow is negligible. Its characteristic function f5(Si,Pi) Is represented as follows:
and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f6(Si,Pi) Is represented as follows:
(4c) selecting action P according to epsilon-greedy policyi. The meaning of this strategy is that the remote sensing satellite selects the action that maximizes the approximate action cost function with a probability of 1-epsilon, and randomly selects the action with a probability of epsilon, as follows:
Step 5, using formula Ri=τih·Cit(Pit,Hi) Calculate the reward R that comes with this power allocation schemei. If I is less than I, the next step is carried out, otherwise, the step 10 is carried out.
Step 6, observing a pre-state S'iSelecting an action P 'in the set of feasible actions using an ε -greedy policy'iAs a power allocation scheme for next slot preselection, and samples (f)i,Pi,Ri,f′i,P′i) And putting the network parameter into an experience memory for subsequent network parameter updating.
(6a) Observation of Pre-State S'i:H′iCan be estimated by taking the average value of historical time, state Bi、DiThe transfer process of (a) can be expressed as follows:
wherein, D'iRepresents data buffer Pre-State, B'iIndicating a battery charge pre-state.
(6c) Selecting an action P 'in a set of feasible actions using an epsilon-greedy policy'iIs the same as step 4, except that the parameter epsilon of the epsilon-greedy strategy in step (4)iNeed to be changed to epsilon'i。
(6d) Sample (f)i,Pi,Ri,f′i,P′i) And putting the data into an experience memory.
Step 7, judging whether the value meets i% T copy0. If so, according to omegacritic=ωactorTo update omegacriticAnd carrying out the next step, otherwise, directly carrying out the next step.
Step 8, judging whether the value meets i% T train0. If so, update ω according to a gradient descent strategyactorAnd carrying out the next step, otherwise, directly carrying out the next step.
Referring to fig. 4, the specific implementation of this step is as follows:
(8a) in the experience memory, PiThe same samples are divided into a group, and the number of the samples in each group is recorded as MP。
(8b) For each set of samples, a cost function Y (ω) is calculatedactor):
(8c) In the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
wherein, the subscript n represents the time slot number corresponding to the sample number,indicating the learning rate of the current time slot.
Step 9, updating the remote sensing satellite network parameters: si+1=S′i,Pi+1=P′iI +1, then go to step 5 and go to the next iteration.
Step 10, finishing the intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network, and outputting omegacriticFor resource joint scheduling.
The invention can better solve the optimization problem of the transmission performance of the remote sensing satellite in a time-varying and unmeasured network environment. The invention introduces the definition of weight vector based on reinforcement learning and establishes two independent networks with the same structure. The two networks continuously update respective parameters based on values in experience accumulation through algorithm guidance, and finally the converged parameters can provide guidance for multi-dimensional resource scheduling. The method is suitable for causal networks with unknown statistical characteristics, not only solves the problem of continuous network state space, but also avoids overestimation in parameter updating to a certain extent. In a word, the method can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provide guidance for the network specification and network optimization of the remote sensing satellite network.
The convergence and effectiveness of the present invention are explained below in conjunction with simulation experiments:
example 7
The intelligent resource joint scheduling method under the environment uncertain remote sensing satellite network is the same as the embodiment 1-6,
simulation conditions and contents:
simulation software: STK, Matlab, Spyder;
simulation scene: the simulation scene of the invention consists of 3 relay satellites, 6 ground stations and 1 remote sensing satellite.
Simulation parameters: assume a set of inter-satellite transmit powers AtsgIs {0:1:80}, and the set of transmission powers { A } between the stars istssIs {0:1:70}, and a set of received powers { A }rIs {0:30:30 }. The satellite-ground link channel bandwidth is 250 MHz. Meanwhile, assuming that there is a fixed static power consumption of 10W per slot of the satellite, it is specified that if data is selected for reception, its reception rate is always 100 Mbit/s. In addition, relevant parameters τ of the learning process are set to 300(s), T to 288(slots), γ to 0.9, and Tcopy=3×T(slots),Ttrain=2×T(slots),I=10002×T(slots),Bmin=0.6×Bmax。
Simulation content: using the simulation scenario, the simulation software, and the network topology shown in fig. 2, the convergence of the method of the present invention will be described first. And then the data buffer capacity is taken as a resource variable, and the effectiveness of the method provided by the invention is illustrated under the comparison of three other methods.
And (3) simulation result analysis:
referring to fig. 5, fig. 5 is a cost value graph obtained by simulation of the present invention, in fig. 5, the abscissa is an Actor network parameter update time slot, and the ordinate is a cost value, where a dotted line represents a cost value change of a satellite-to-ground link, a solid line represents a cost value change of an inter-satellite link, and fig. 5 takes the cost value as an index to measure an approximation degree between an approximate action cost function and an action cost function. As can be seen from fig. 5, the overall situation of convergence after the decrease is presented as the learning process advances, regardless of the cost values of the satellite-to-ground links or the cost values of the inter-satellite links. This is because the Actor network parameters of the satellite-to-ground link and the inter-satellite link are updated according to the gradient descent strategy, that is, the network parameters are always updated in the opposite direction of the gradient, so that the error between the approximate action cost function and the action cost function is gradually reduced, and in addition, the generation of the sample is influenced by the epsilon-greedy strategy, so that the integrated body is shown as that the cost value gradually descends in the fluctuation. When the learning process reaches a certain time, the exploration rate and the learning rate reach smaller values, the updating change of the network parameters is not large, and the representation on the graph is the convergence of the cost value. The cost value curve graph obtained by simulation is consistent with theoretical analysis, and the convergence of the method is verified.
Example 8
The method for jointly scheduling intelligent resources under the uncertain environment remote sensing satellite network is the same as the embodiments 1-6, and the simulation conditions and contents are the same as the embodiment 7
Referring to fig. 6, fig. 6 is a comparison graph of average period rewards obtained by simulation of the present invention, in fig. 6, the abscissa is the data buffer capacity, and the ordinate is the average period rewards of the remote sensing satellite network, wherein the solid dot line represents the intelligent resource joint scheduling method provided by the present invention, the solid dot line represents the greedy resource joint scheduling method, the solid square line represents the Q-learning resource joint scheduling method, and the solid triangular dot line represents the random resource joint scheduling method. Fig. 6 shows the effect of the variation of the data buffer margin on the network performance and the performance difference of the four methods under the four methods by using the average period transmission data amount as the performance index. In the figure, the method proposed by the present invention is the best overall performance, followed by the greedy method, the second best Q-learning method, and the worst performance is the random method. As can be seen from fig. 6, in the case of fixed resources in the other two dimensions, as the data buffer margin increases, the performance of the four methods shows a trend of first rising and then smoothing, which is the performance saturation caused by the resource limitation in the other dimensions. It is worth mentioning that the proposed method and the greedy method perform close to each other when the data buffer margin is small. This is because when the data buffer margin is small, the energy consumption requirement for transmitting the existing data is not large, and other resources are relatively abundant for the small data buffer margin, and the energy consumption requirement can be always satisfied, so that the method proposed by the present invention has a small significance for energy storage reuse, resulting in a performance similar to that of a greedy method.
In short, the invention discloses an intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network, which solves the optimization problem of remote sensing satellite transmission performance under a time-varying and unmeasured network environment. The implementation comprises the following steps: establishing a remote sensing satellite network model with uncertain environment; generating data of an environmental parameter; initializing parameters required by the intelligent resource joint scheduling method; directing the satellite to perform power allocation; pre-transferring the state of the remote sensing satellite network; guiding the satellite to perform power pre-allocation; critic network parameter omegacriticUpdating and judging; actor network parameter ωactorUpdating and judging; updating the state, the action and the current time slot number of the remote sensing satellite network; obtaining a network parameter omega for guiding joint schedulingcriticAnd guidance is provided for multi-dimensional resource scheduling. The invention fully considers the task transmission characteristics of the remote sensing satellite and the particularity of the network environment of the remote sensing satellite and obtains the environmental data parameters under certain scale and parameters through software simulation. In addition, the invention defines a six-dimensional characteristic function based on reinforcement learning, and then linearly approximates the action value function by combining a weight vector. Two independent, structurally identical networks continuously update their respective parameters based on the values during experience accumulation. The method is suitable for the remote sensing satellite network with unknown statistical characteristics, not only solves the problem of spatial continuity of the network state, but also avoids over estimation in parameter updating to a certain extent. The invention can better adapt to the remote sensing satellite network in the dynamic random change environment in the future and provides guidance for the network specification and network optimization of the remote sensing satellite network.
Claims (5)
1. An intelligent resource joint scheduling method under an environment uncertain remote sensing satellite network is characterized in that an established network model is suitable for the environment where the remote sensing satellite network is located and the resource scheduling scene of the remote sensing satellite network, and the problems of directly solving a high-complexity planning problem and continuously and infinitely solving a state space are avoided through reinforcement learning, and the method comprises the following steps:
(1) establishing a remote sensing satellite network model with uncertain environment: establishing a remote sensing satellite network model with uncertain environment, firstly determining the scale and parameters of the remote sensing satellite network, including the number and positions of the remote sensing satellite and the ground station, and then defining a state set S, an action set A, a reward R and an action value function of the remote sensing satellite networkThe state set S ═ { B × D × H × EHAt the ith time slot starting time, remotely sensing the state S of the satellite networkiIncluding the current charge B of the batteryiData buffer existing data volume DiChannel parameter HiAnd absorption of solar energyFour parts; according to the standards of ITU-R P.618-13, ITU-R P.838 and ITU-R P.839 recommendation, a dynamic channel model of the satellite-to-ground and inter-satellite links is established, and a channel parameter H is obtained through simulationi(ii) a Considering the orbit characteristic of satellite operation, establishing a dynamic energy collection model, and simulating to obtain the absorbed solar energyThe action set A ═ { A ═ Ar×AtIncludes received power { A }rAnd transmit power { A }tTwo parts, which can be respectively expressed asAndwhere δ represents the step size, 0 represents no data being received or transmitted, PMAXWhich represents the maximum power value, and when the transmission link is a satellite-to-ground link,if not, then,the reward R is expressed by the data volume sent by the satellite at the initial time of the time slot; the action cost functionThe meaning of (1) is that the agent is guided by a strategy pi in a state SiNext, action P is performediLater, an expectation of return is obtained; completing the establishment of a remote sensing satellite network model with uncertain environment;
(2) generating data of the environmental parameters: deriving original data of environmental parameters in a topological period through an STK software simulation remote sensing satellite network model, and processing the original data through MATLAB software to obtain link on-off, link connection duration, remote sensing satellite position and duration of each time slot in the sun, wherein the data are used as environmental parameter data of an intelligent resource joint scheduling method;
(3) initializing parameters required by the intelligent resource joint scheduling method: the parameters required by the intelligent resource joint scheduling method comprise the time slot number T of a period and the satellite-borne battery capacity BmaxBattery capacity threshold BminData memory capacity DmaxStatic power consumption PconsLength of unit time slot tau, rate of exploration epsilon, Critic network parameter omegacriticActor network parameter ωactorUpdate interval T of learning rate alpha, Critic network parameterscopyUpdate interval T of Actor network parameterstrainTraining total time slot number I, current time slot number I and discount factor gamma;
(4) and guiding the satellite to perform power distribution: observing the state SiBased on each feasible action, through defined six-dimensional characteristics reflecting working characteristics and environmental influence of the remote sensing satelliteFunction, extracting feature vector f of state and action pairi(Si,Pi) Calculating and combining with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyiAs a power allocation scheme of the current time slot, the satellite is guided to perform power allocation;
(5) pre-transferring the state of the remote sensing satellite network: reward R in remote sensing satellite network model with uncertain computing environmentiAnd judging whether iteration is finished: if the I is I, the step (10) is carried out, otherwise, the next step is carried out, and a new iteration is executed;
(6) and guiding the satellite to perform power pre-allocation: observation of Pre-State S'iExtracting a feature vector f 'of a state and action pair through a defined six-dimensional feature function reflecting the working characteristics and the environmental influence of the remote sensing satellite based on each feasible action'i(S′i,P′i) In combination with an Actor network parameter omegaactorSelecting an action P in the set of feasible actions using an epsilon-greedy strategyi' as the power allocation scheme for next slot pre-selection, and samples (f)i,Pi,Ri,f′i,P′i) Putting the network parameter into an experience memory for subsequent network parameter updating;
(7) critic network parameter omegacriticUpdating and judging: update interval T for current time slot number i and Critic network parameterscopyPerforming a remainder operation to determine whether the remainder operation result satisfies i% TcopyIf 0, then according to ωcritic=ωactorTo update the critical network parameter omegacriticCarrying out the next step, otherwise, directly carrying out the next step;
(8) actor network parameter ωactorUpdating and judging: updating interval T for current time slot number i and Actor network parametertrainPerforming a remainder operation to determine whether the remainder operation result satisfies i% TtrainIf yes, updating the Actor network parameter omega according to a gradient descent strategyactorCarrying out the next step, otherwise, directly carrying out the next step;
(9) updating state, action and time of remote sensing satellite networkNumber of preceding slots: si+1=S′i,Pi+1=Pi', i ═ i +1, one iteration is completed, and then go to step (5);
(10) obtaining a network parameter omega for guiding joint schedulingcritic: outputting a network parameter omega obtained by training through an intelligent resource joint scheduling method under the environment uncertain remote sensing satellite networkcriticThe intelligent resource joint scheduling method under the remote sensing satellite network with uncertain environment is finished; in practical applications, based on this parameter, a resource joint scheduling scheme is generated according to greedy policy (e-greedy policy under e ═ 0).
2. The method for jointly scheduling intelligent resources under the uncertain remote sensing satellite network according to claim 1, wherein the step (4) of directing the satellite to perform power distribution is to select an action P in a feasible action set by using an epsilon-greedy strategyiThe method comprises the following steps:
(4a) computing a set of feasible actions { Af}i: from the state S of the remote sensing satellite networkiComputing a set of satisfied feasible actions { Af}iAll actions P of a conditioniIncluding the received power PirAnd a transmission power PitTwo parts are as follows:
wherein, tauihIndicating the duration of the link at the start of the ith time slot, Cit(Pit,Hi) Indicating the start time of the ith time slot based on the current transmission power PitAnd channel parameter HiThe link transmission rate of the following can be calculated as follows:
here, Bc(Hz) represents the channel bandwidth;
(4b) meterComputing six-dimensional feature vector fi(Si,Pi): for each set of state and action pairs (S)i,Pi) Computing the feature vector f by six feature functionsi(Si,Pi) Each dimension element of the feature vector represents the state-based S in the current dimension by a specific numerical value not exceeding 1iPerformed action PiIs a function of state and action;
(4c) selecting action P according to epsilon-greedy policyi: according to the epsilon-greedy strategy, the epsilon-greedy strategy means that the remote sensing satellite has a feasible action set { A }f}iIn which either the action with the highest value of the approximate action is chosen with a probability of 1-epsilon or the action is chosen randomly with a probability of epsilon, only one result being the action P chosen according to the epsilon-greedy strategyiExpressed as follows:
3. the intelligent resource joint scheduling method of claim 1, wherein said calculating six-dimensional feature vector f of step (4b)i(Si,Pi) Specifically, the following six-dimensional considerations are considered.
(4b1) Calculating a first dimension: the first dimension indicates whether the action takes into account the battery energy status, i.e. whether the energy consumed to perform the action can eliminate the potential energy overflow phenomenon due to absorption of solar energy. Its characteristic function f1(Si,Pi) Is represented as follows:
(4b2) And calculating in a second dimension: the second dimension indicates whether the action takes into account the data buffer status, i.e., whether the amount of data sent can eliminate potential data overflow due to received data. Its characteristic function f2(Si,Pi) Is represented as follows:
wherein,indicating the amount of data received by the satellite at the ith time slot,is the amount of data transmitted by the satellite at the ith time slot,DRmaxrepresents the maximum amount of received data;
(4b3) and calculating in a third dimension: the third dimension indicates whether the action is consistent with an optimal power allocation. The characteristic function f of the satellite-ground link and the inter-satellite link is different due to the model difference3(Si,Pi) Need to be expressed separately according to the link condition.
Wherein,the method comprises two parts of receiving power and transmitting power, and is obtained under the constraint of multidimensional resources by taking the Lagrange multiplier method as a target to maximize the total data quantity transmitted in the current time slot and the next time slot. There are four categories that can be classified according to the link switching situation.
And in two continuous satellite-ground links, solving the optimal power according to the water injection theorem.
Wherein, Bs=PconstX τ, representing static energy consumption, Pi WFWhich represents the value of the optimum power,is an average of the historical channel parameters and its role is to estimate the channel parameters at the next time instant. B is[i,i+1]Representing the maximum energy available for allocation to data transmission in both the ith and (i + 1) th time slots. In order to guarantee the feasibility of the optimum power, the invention makes the following constraints:
wherein,indicating that the current actionableThe maximum value of the transmit power in the set is made,denotes the lower rounding operation, δiIndicating the step size of the transmit power set of the slot. Then, the total data transmission amount in two time slotsCan be expressed as follows:
wherein,indicates that the current time isIn the power allocation method of (3), the maximum value of the transmission power in the feasible action set is obtained after the next time is shifted.
And in the continuous two intersatellite links, solving the optimal power according to linear programming.
Wherein, B'i、D'iIndicating the remaining resources available at the current time after the resource allocation at the next time is known. Can be calculated as follows:
indicates that the current time is [ P ]ir,0]The maximum value of the transmission power in the feasible action set at the next time is determined in the power allocation method of (1). Reuse of formula (1) for further two casesIs limited to obtainThen, the total data transmission amount in two time slotsThe expression is as follows:
and solving the optimal power distribution in the inter-satellite link to the satellite-ground link according to a Lagrange multiplier method.
In the formula (2) -formula (3)Is replaced byThus, B 'can be obtained'i、D'iAnd then obtainThe expression is as follows:
the optimal power is limited by the formula (1) to obtainThen, the total data transmission amount in two time slotsThe expression is as follows:
in the link from the satellite-to-ground link to the inter-satellite link, solving the optimal power distribution according to a Lagrange multiplier method;
the optimal power is limited by the formula (1) to obtainThen, the total data transmission amount in two time slotsCan be expressed as follows:
in summary, P is changedirAnd calculate its correspondencesBy comparing correspondencesFinding an optimal set of power allocationsSo thatWhen the maximum value is reached, the power distribution condition under the current link can be embodied;
(4b4) and calculating the fourth dimension: the fourth dimension represents whether the network resource can be fully utilized or not when the energy is abundant, so as to avoid energy waste, and the characteristic function f of the fourth dimension4(Si,Pi) Is represented as follows:
wherein,representing the maximum energy which can be consumed in the feasible action set of the current time slot of the remote sensing satellite;
(4b5) and calculating a fifth dimension: in the second dimension, the characteristic value corresponding to data overflow is defined as 0, which is because the actually received data amount is not matched with the paid energy, resulting in energy waste; the fifth dimension is a supplement of the second dimension, which means that when the energy is abundant, the energy waste caused by data overflow is negligible; its characteristic function f5(Si,Pi) Is represented as follows:
(4b6) and calculating the sixth dimension: the sixth dimension represents the received power allocation and is characterized by a function f6(Si,Pi) Is represented as follows:
f6(Si,Pi) A sixth characteristic function for the ith time slot;
the calculation result of the six feature functions is the six-dimensional feature vector for guiding the satellite power distribution, and is used as the action P selected in the step (4c)iIs an important basis.
5. the intelligent resource joint scheduling method of claim 1, wherein said step (8) of updating ω according to a gradient descent strategyactorThe method comprises the following steps:
(8a) sampling: store P in experience memoryiSame sample (f)i,Pi,Ri,f′i,Pi') are divided into one group, and the number of samples in each group is recorded as MP;
(8b) Calculating a cost function Y (omega) of each group of samplesactor): for each set of samples, a cost function Y (ω) is calculatedactor):
(8c) updating omegaactor: in the cost function, for ωactorUsing a gradient descent strategy, ω is accomplishedactorUpdating:
ωactor=ωactor-Δωactor
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011251365.3A CN112422171B (en) | 2020-11-09 | 2020-11-09 | Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011251365.3A CN112422171B (en) | 2020-11-09 | 2020-11-09 | Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112422171A true CN112422171A (en) | 2021-02-26 |
CN112422171B CN112422171B (en) | 2021-09-03 |
Family
ID=74781837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011251365.3A Active CN112422171B (en) | 2020-11-09 | 2020-11-09 | Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112422171B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113133078A (en) * | 2021-04-19 | 2021-07-16 | 西安电子科技大学 | Light-weight inter-satellite switching device and method for giant low-orbit satellite network |
CN113378366A (en) * | 2021-06-03 | 2021-09-10 | 北京建筑大学 | Guidance information layout method for guidance sign of comprehensive passenger transport hub |
CN113572517A (en) * | 2021-07-30 | 2021-10-29 | 哈尔滨工业大学 | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning |
CN113613301A (en) * | 2021-08-04 | 2021-11-05 | 北京航空航天大学 | Air-space-ground integrated network intelligent switching method based on DQN |
CN114726431A (en) * | 2022-03-02 | 2022-07-08 | 武汉大学 | Low-earth-orbit satellite constellation-oriented beam hopping multiple access method |
CN116436510A (en) * | 2023-04-28 | 2023-07-14 | 银河航天(成都)通信有限公司 | Method, device and storage medium for transmitting application data by using relay satellite |
CN117459122A (en) * | 2023-11-09 | 2024-01-26 | 成都本原星通科技有限公司 | Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1643860A (en) * | 2002-03-28 | 2005-07-20 | 马科尼英国知识产权有限公司 | Method and arrangement for dinamic allocation of network resources |
US20070021998A1 (en) * | 2005-06-27 | 2007-01-25 | Road Ltd. | Resource scheduling method and system |
CN104573856A (en) * | 2014-12-25 | 2015-04-29 | 北京理工大学 | Spacecraft resource constraint processing method based on time topological sorting |
CN106060858A (en) * | 2016-05-18 | 2016-10-26 | 苏州大学 | Method and apparatus for software defining satellite networking based on OpenFlow extended protocol |
CN106100719A (en) * | 2016-06-06 | 2016-11-09 | 西安电子科技大学 | Moonlet network efficient resource dispatching method based on earth observation task |
CN106230497A (en) * | 2016-09-27 | 2016-12-14 | 中国科学院空间应用工程与技术中心 | A kind of Information Network resource bilayer dispatching method and system |
CN110099388A (en) * | 2019-03-21 | 2019-08-06 | 世讯卫星技术有限公司 | A kind of satellite mobile communication method with the 5G network integration |
US20200019435A1 (en) * | 2018-07-13 | 2020-01-16 | Raytheon Company | Dynamic optimizing task scheduling |
-
2020
- 2020-11-09 CN CN202011251365.3A patent/CN112422171B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1643860A (en) * | 2002-03-28 | 2005-07-20 | 马科尼英国知识产权有限公司 | Method and arrangement for dinamic allocation of network resources |
US20070021998A1 (en) * | 2005-06-27 | 2007-01-25 | Road Ltd. | Resource scheduling method and system |
CN104573856A (en) * | 2014-12-25 | 2015-04-29 | 北京理工大学 | Spacecraft resource constraint processing method based on time topological sorting |
CN106060858A (en) * | 2016-05-18 | 2016-10-26 | 苏州大学 | Method and apparatus for software defining satellite networking based on OpenFlow extended protocol |
CN106100719A (en) * | 2016-06-06 | 2016-11-09 | 西安电子科技大学 | Moonlet network efficient resource dispatching method based on earth observation task |
CN106230497A (en) * | 2016-09-27 | 2016-12-14 | 中国科学院空间应用工程与技术中心 | A kind of Information Network resource bilayer dispatching method and system |
US20200019435A1 (en) * | 2018-07-13 | 2020-01-16 | Raytheon Company | Dynamic optimizing task scheduling |
CN110099388A (en) * | 2019-03-21 | 2019-08-06 | 世讯卫星技术有限公司 | A kind of satellite mobile communication method with the 5G network integration |
Non-Patent Citations (4)
Title |
---|
YU WANG 等: "Joint Scheduling of Observation and Transmission in Earth Observation Satellite Networks", 《GLOBECOM 2017 - 2017 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 * |
周笛 等: "巨型星座系统的网络运维与资源管控技术", 《天地一体化信息网络》 * |
慈元卓 等: "不确定环境下多星联合观测调度问题研究", 《系统工程与电子技术》 * |
李玉庆: "动态不确定环境下航天器观测调度问题研究", 《中国博士学位论文全文数据库》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113133078B (en) * | 2021-04-19 | 2022-04-08 | 西安电子科技大学 | Light-weight inter-satellite switching device and method for giant low-orbit satellite network |
CN113133078A (en) * | 2021-04-19 | 2021-07-16 | 西安电子科技大学 | Light-weight inter-satellite switching device and method for giant low-orbit satellite network |
CN113378366A (en) * | 2021-06-03 | 2021-09-10 | 北京建筑大学 | Guidance information layout method for guidance sign of comprehensive passenger transport hub |
CN113378366B (en) * | 2021-06-03 | 2023-08-18 | 北京建筑大学 | Comprehensive passenger transport hub guiding identification guiding information layout method |
CN113572517B (en) * | 2021-07-30 | 2022-06-24 | 哈尔滨工业大学 | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning |
CN113572517A (en) * | 2021-07-30 | 2021-10-29 | 哈尔滨工业大学 | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning |
CN113613301B (en) * | 2021-08-04 | 2022-05-13 | 北京航空航天大学 | Air-ground integrated network intelligent switching method based on DQN |
CN113613301A (en) * | 2021-08-04 | 2021-11-05 | 北京航空航天大学 | Air-space-ground integrated network intelligent switching method based on DQN |
CN114726431A (en) * | 2022-03-02 | 2022-07-08 | 武汉大学 | Low-earth-orbit satellite constellation-oriented beam hopping multiple access method |
CN114726431B (en) * | 2022-03-02 | 2023-12-12 | 国家计算机网络与信息安全管理中心 | Wave beam hopping multiple access method facing low orbit satellite constellation |
CN116436510A (en) * | 2023-04-28 | 2023-07-14 | 银河航天(成都)通信有限公司 | Method, device and storage medium for transmitting application data by using relay satellite |
CN116436510B (en) * | 2023-04-28 | 2024-05-17 | 银河航天(成都)通信有限公司 | Method, device and storage medium for transmitting application data by using relay satellite |
CN117459122A (en) * | 2023-11-09 | 2024-01-26 | 成都本原星通科技有限公司 | Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life |
CN117459122B (en) * | 2023-11-09 | 2024-07-05 | 成都本原星通科技有限公司 | Resource allocation method for coordinating scheduling mechanism of terminal equipment and satellite service life |
Also Published As
Publication number | Publication date |
---|---|
CN112422171B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112422171B (en) | Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network | |
Liu et al. | Cooperative offloading and resource management for UAV-enabled mobile edge computing in power IoT system | |
Zhou et al. | Machine learning-based resource allocation in satellite networks supporting internet of remote things | |
Gunduz et al. | Designing intelligent energy harvesting communication systems | |
CN110427261A (en) | A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree | |
CN110113190A (en) | Time delay optimization method is unloaded in a kind of mobile edge calculations scene | |
CN114626306B (en) | Method and system for guaranteeing freshness of regulation and control information of park distributed energy | |
CN102299854B (en) | Opportunistic network environment-oriented multi-object routing decision making system | |
CN113382060B (en) | Unmanned aerial vehicle track optimization method and system in Internet of things data collection | |
CN109451556A (en) | The method to be charged based on UAV to wireless sense network | |
Hu et al. | Edge intelligence for real-time data analytics in an IoT-based smart metering system | |
Zhao et al. | Adaptive multi-UAV trajectory planning leveraging digital twin technology for urban IIoT applications | |
CN117459112A (en) | Mobile edge caching method and equipment in LEO satellite network based on graph rolling network | |
CN109413746B (en) | Optimized energy distribution method in communication system powered by hybrid energy | |
CN116566466A (en) | Multi-target dynamic preference satellite-ground collaborative computing unloading method for low orbit satellite constellation | |
CN115912430A (en) | Cloud-edge-cooperation-based large-scale energy storage power station resource allocation method and system | |
CN116431326A (en) | Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning | |
Wang et al. | Trajectory planning of UAV-enabled data uploading for large-scale dynamic networks: A trend prediction based learning approach | |
CN115412156A (en) | Urban monitoring-oriented satellite energy-carrying Internet of things resource optimization allocation method | |
Zhao et al. | Online Trajectory Optimization for Energy-Efficient Cellular-Connected UAVs With Map Reconstruction | |
Zhang et al. | Collaborative Task Offloading Optimization for Satellite Mobile Edge Computing Using Multi-Agent Deep Reinforcement Learning | |
CN115361688B (en) | Industrial wireless edge gateway optimization layout scheme based on machine learning | |
CN113840306B (en) | Distributed wireless network access decision method based on network local information interaction | |
CN115629540A (en) | Satellite Internet of things online resource joint allocation method based on meta reinforcement learning | |
Shen et al. | Cost-effective task offloading and trajectory optimization in UAV assisted edge networks with DDPG |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |