CN115412156B

CN115412156B - Urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method

Info

Publication number: CN115412156B
Application number: CN202211006067.7A
Authority: CN
Inventors: 李源; 许海涛; 徐佳康; 杨仁金; 张海旺; 吕挺
Original assignee: Beijing Penghu Wuyu Technology Development Co ltd
Current assignee: Beijing Penghu Wuyu Technology Development Co ltd
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2023-07-14
Anticipated expiration: 2042-08-22
Also published as: CN115412156A

Abstract

The invention provides an urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method, which comprises the following steps: s1, constructing a low orbit satellite LEO auxiliary city monitoring network model, and providing energy transmission and data acquisition services for multiple monitoring nodes; s2, adopting a monitoring node cluster head election algorithm based on K-Means to perform cluster head election and resource allocation; s3, network resource allocation optimization is carried out by adopting comprehensive data acquisition, energy transmission and LEO energy consumption; s4, based on a Markov decision process problem model and a DDPG algorithm, adopting a low orbit satellite LEO to assist in multi-objective joint optimization of urban monitoring network data acquisition and energy transmission. The optimization of the invention aims at maximizing the uplink data collection amount and the downlink energy transmission amount by jointly optimizing the low-orbit satellite LEO flight decision and resource allocation, and reducing the low-orbit satellite LEO energy consumption as much as possible.

Description

Urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method

Technical Field

The invention relates to the field of wireless communication of the low-orbit satellite Internet of things, in particular to an urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method.

Background

In a low orbit satellite LEO assisted urban monitoring network, the low orbit satellite LEO provides energy transmission and data acquisition services for multiple detection nodes through mobile deployment. However, since the tasks of the monitoring nodes are different and the position distribution is not uniform, differences are generated in the aspects of data generation rate, distribution density, energy consumption rate and the like. In addition, because the monitoring nodes which are closer to the information collecting node need to bear more communication load, the nodes are easy to consume own energy prematurely, and if the monitoring nodes cannot obtain timely data acquisition and energy transmission service, the monitoring nodes can cause serious energy holes and data loss.

Disclosure of Invention

The optimization objective of the invention is to maximize the uplink data collection amount and the downlink energy transmission amount by jointly optimizing the low-orbit satellite LEO flight decision and the resource allocation, and reduce the low-orbit satellite LEO energy consumption as much as possible, and in the process of actually establishing the multi-objective optimization problem, three optimization objectives have conflicts to a certain extent. How to find the best coverage service position to make a flight decision and optimize a resource allocation decision, the process is very complex, and considerable calculation cost is brought. Furthermore, conventional model-based methods such as dynamic planning methods are not effective in solving this problem, since the environment is partially observable.

Therefore, the invention divides the problem into two parts of cluster head election and resource allocation. Firstly, a low-orbit satellite LEO auxiliary city monitoring network clustering model is provided, a corresponding cluster head election algorithm is provided, after the nodes are clustered, a proper monitoring node is selected from each cluster to serve as a cluster head, and the cluster head nodes collect data of the monitoring nodes in the clusters and forward the data to the low-orbit satellite LEO.

Then, a low orbit satellite LEO assisted city monitoring network resource allocation strategy is provided, and a corresponding algorithm is provided. The problem can be described as a markov decision process, thus building a relevant problem model. Considering that the monitoring nodes in the scene are densely distributed, the DQN algorithm is not suitable for continuous action space, the DDPG as a classical DRL algorithm is proved to learn an effective strategy in the continuous action space through low-dimensional observation, the algorithm is suitable for low-orbit satellite LEO flight decision problem, and the rewards in the original DDPG algorithm are considered as scalar values.

The technical scheme of the invention mainly comprises the following steps:

a city monitoring-oriented satellite energy-carrying Internet of things resource optimization allocation method comprises the following steps:

s1, constructing a low orbit satellite LEO auxiliary city monitoring network model, and providing energy transmission and data acquisition services for multiple monitoring nodes;

s2, adopting a monitoring node cluster head election algorithm based on K-Means to perform cluster head election and resource allocation;

s3, network resource allocation optimization is carried out by adopting comprehensive data acquisition, energy transmission and LEO energy consumption;

s4, based on a Markov decision process problem model and a DDPG algorithm, adopting a low orbit satellite LEO to assist in multi-objective joint optimization of urban monitoring network data acquisition and energy transmission.

The method specifically comprises the following steps:

s1, constructing a low orbit satellite LEO auxiliary city monitoring network model, wherein the specific scene is that a single low orbit satellite LEO provides energy transmission and data acquisition service for a plurality of monitoring nodes through mobile deployment. The LEO is provided with a single antenna, the node is provided with a plurality of antennas, and information decoding and energy collection are respectively carried out based on an antenna switching structure.

And constructing a transmission queue model of the system.

And establishing a probability channel model by comprehensively considering the occurrence probability of the sight link and the non-sight link channels, and taking the probability channel model as a channel model of LEO and a ground monitoring node.

And determining the next action of the LEO in real time in the moving process, updating the position of the LEO, and jointly considering the flight energy consumption, the coverage service energy consumption and the communication energy consumption to construct an energy consumption model of the system.

LEO sends radio frequency signals to monitoring nodes at specific transmit powers in sub-slots by serving the flight speed and yaw angle decisions to move to the target node location. In the sub-time slot, all monitoring nodes in the satellite energy transmission coverage range are charged, and accordingly an energy transmission model of the system is built.

S2, dividing all M monitoring nodes into K clusters by using a K-Means algorithm, selecting a proper monitoring node from each cluster as a cluster head, collecting data of monitoring nodes in the clusters by the cluster head nodes, and forwarding the data to the LEO. The LEO performs energy transfer to all nodes in the cluster during the overlay service phase. At each time slot, the LEO selects one cluster head node as the target node for the next service. The service priority of the node is considered for the selection of the target node;

s3, comprehensively considering three aspects of data acquisition requirements, energy transmission requirements and low orbit satellite LEO energy consumption, realizing maximization of uplink data collection quantity and downlink energy transmission quantity through combined optimization of LEO flight decision and resource allocation, defining a multi-objective optimization problem, and optimizing.

S4, constructing a problem model according to the Markov decision process. The state space of the system is first described. Then based on the system state and environment in the low orbit satellite LEO auxiliary city monitoring network research scene, the action selected by the LEO at the specific time slot comprises the flying speed, flying angle and time slot allocation of the LEO and the transmitting power allocation, the action space description is constructed. Meanwhile, the method is used as quantitative evaluation after the intelligent agent takes action in reinforcement learning.

Further:

s1 specifically comprises the following steps:

s101, constructing a low orbit satellite LEO auxiliary city monitoring network model

The scenario provides energy transmission and data acquisition services for multiple monitoring nodes through mobile deployment for a single LEO. The LEO is provided with a single antenna, the node is provided with a plurality of antennas, and information decoding and energy collection are respectively carried out based on an antenna switching structure;

each time the duration of a flight task is T > 0, the total time is divided into equal-length time slots, i.e., t=1, 2. And the monitoring node receives the radio frequency signal based on the antenna switching structure, decodes the information and simultaneously collects energy, and the cluster head node uploads the monitoring data to the LEO in an uplink sub-time slot.

S102, constructing a transmission queue model

In this scenario, the monitoring node is used to

The node position is represented as [ x ] _m ,y _m ]. For monitoring node->

Setting lambda _m (t) represents the data generation rate of node m during the execution of the monitoring task at time slot t; let lambda of different nodes _m (t) obeys a poisson distribution and the parameter is constant during the monitoring task, i.e. lambda _m (t)＝λ _m . Set->

Representing the length of data waiting to be uploaded in the data transmission queue of the monitoring node m at time slot t +.>

Expressed as:

wherein the method comprises the steps of

Is the maximum capacity of the data transmission queue storage, assuming all monitoring nodes

Same, when->

Exceed->

When this means that newly collected data cannot be placed in the node data buffer and will be discarded, causing data overflow.

For energy transmission requirements, set

Representing the residual energy of the monitoring node m at the time slot t, and setting mu _m (t) represents the energy consumption rate of the node in time slot t, mu of different time slots _m (t) is the same, i.e. μ _m (t)＝μ _m Also because of different hardware factors and deployment locations, the mu of the monitoring node _m Different. +.>

Expressed as:

wherein the method comprises the steps of

Is the maximum capacity of the energy transfer queue storage, assuming all monitoring nodes

The same applies. When- >

When the monitoring node is in energy exhaustion, normal service cannot be provided, and an energy cavity condition occurs.

S103, constructing a system channel model

The probability channel model is established by comprehensively considering the occurrence probability of the line-of-sight link LOS and the non-line-of-sight link NLOS channels, and is used as a communication model of LEO and a ground monitoring node, and the corresponding LOSs under the model is expressed as:

in gamma ₀ ＝(4πf _c /c) ^-2 Representing the reference distance d ₀ Channel power gain at=1m, f _c Representing the carrier frequency, c representing the speed of light; d, d _m (t) is the distance between LEO and the target node m,

representing a path loss index; mu (mu) ^NLOS Is the attenuation coefficient of the NLOS link.

For monitoring node m, the LOS probability at time t is:

where a and b are constants, θ depending on the carrier frequency and the type of environment _m (t) is the elevation angle between the LEO and the target monitoring node, expressed as:

θ _m (t)＝(180/π)sin ^-1 (H/d _m (t)) (5)

non-line-of-sight link probability passing P _t ^NLOS (θ _m (t))＝1-P _t ^LOS (θ _m (t)) to represent. The downlink channel power gain and the uplink channel power gain of the communication link between LEO and target monitoring node m are denoted as h, respectively _m (t) and g _m (t). I.e. the channel power gain between LEO and target node is expressed as:

s104, constructing a system energy consumption model

Assuming LEO flies at a fixed height H > 0, the horizontal position at time slot t is denoted as [ x ] _u (t),y _u (t)]The LEO determines its next action in real time during the movement and updates its location. The flight control of LEO in this scenario is described by a flight speed v (t) limited by the maximum flight speed and a yaw angle θ (t) limited by θ (t) ∈ [ -pi, pi []Is limited by the number of (a). Here, the energy consumption model study on LEO will jointly consider flight energy consumption, coverage service energy consumption and communication energy consumption, wherein the propulsion power consumption of LEO at speed V during flight is calculated by the following formula:

p in the formula ₀ Is blade profile power at overlay service, U _tip Is the tip speed of the rotor blade. P (P) _i And v ₀ Representing induction power and average rotor induction speed under overlay service conditions. For parasitic power, d ₀ ρ, s, A represent the fuselage resistance ratio, air density, rotor solidity and rotor disk area, respectively. The propulsion power consumption of the LEO includes the blade profile, the inductive power and the parasitic power, corresponding to the three parts of equations (4) - (7). The power consumption is obtained by setting v=0 for the overlay service:

P _hov ＝P(V＝0)＝P ₀ +P _i (8)

the flight expended energy of the LEO in time slot t is expressed as:

and (5) carrying out energy transmission and data acquisition on the monitoring nodes in the LEO coverage range in the coverage service stage.

S105, constructing an energy transmission model

LEO transmits power P in sub-slots τ (t) by serving the flight speed and yaw angle decisions to move to target node locations _d (t) transmitting a radio frequency signal to the monitoring node, wherein P _d (t) receive

Is limited by the number of (a). Within τ (t), all monitoring nodes within the LEO energy transmission coverage will get charged, and the received power at monitoring node m is expressed as:

a nonlinear energy transfer model is applied as the air-to-ground energy transfer model. By the RF-EH model, the actual power of the receiving end is expressed as:

wherein P is _limit The maximum output DC power, c and d are circuit characteristic correlation constants.

S2 specifically comprises the following steps:

s201, distance formula design of K-Means algorithm

For the distance formula design of the K-Means algorithm, joint monitoring node characteristics and Euclidean distance are used as joint distances:

wherein a and b are the length and width of the city monitoring network model, respectively.

Dividing M monitoring nodes into K clusters according to a clustering algorithm, wherein node subsets corresponding to each cluster are expressed as

Wherein the intra-cluster node transmits self monitoring data to the cluster head node, LEO transmits energy to all nodes in the cluster in the coverage service stage, the monitoring node serving as the cluster head uploads the monitoring data through an uplink in the time of 1-tau (t), and the uplink transmitting power of the cluster head node is +. >

Depending on the total energy collected during τ (t), i.e. +.>

Is positively related to->

Expressed as:

wherein ζ represents the energy conversion efficiency, which is a constant value.

The upload data rate for cluster head node k is expressed as:

in each time slot, the LEO selects a cluster head node as a target node of the next service, if the target node is still the current node, the LEO of the next time slot continuously maintains the coverage service state, if the target node is changed, the LEO is in a flight state, and the LEO moves to the position of the target node through decision of the flight speed and the deflection angle. For the selection of the target node, the service priority of the node needs to be considered, wherein the service priority comprises a data acquisition priority, an energy supply priority and a node distance, the data acquisition priority is set based on the data queue length and the data generation rate of the monitoring node, and the energy supply priority is the same. The service priority of cluster head node k at time slot t is defined as Q _k (t)：

Wherein the data acquisition priority and the energy supply priority weights are α=1 and β=5, respectively.

S3 specifically comprises the following steps:

network performance is required to be optimized by comprehensively considering three aspects of data acquisition requirements, energy transmission requirements and LEO energy consumption:

(1) Data acquisition amount

The data acquisition during the coverage service of the LEO at the monitoring node k in the time slot t is realized based on the cluster head node, and the corresponding data acquisition amount is expressed as:

D _k (t)＝R _k (t)(1-τ(t)) (16)

the total data acquisition amount of the LEO on the monitoring node in the task period T is expressed as:

(2) Energy transmission quantity

In the time slot t, LEO sends data acquisition information to cluster head node k and the rest monitoring nodes in the LEO coverage range and transmits energy, and during the service of the cluster head node k position coverage, the energy transmission quantity of LEO in the time slot t is expressed as:

the total amount of energy transfer of the LEO during the mission period T is expressed as:

(3) Low orbit satellite LEO energy consumption

According to the state of LEO in time slot t, LEO energy consumption is divided into flight energy consumption and coverage service energy consumption, wherein the energy consumption during coverage service comprises LEO coverage service energy consumption and downlink transmission total energy, and the energy consumption is expressed as:

if in flight, LEO energy consumption is denoted as E ^uav (t) =p (v) t, so the energy consumption of LEO in a task cycle is expressed as:

setting up

I.e. the data volume in the monitoring node is larger than the threshold +.>

Data overflow is considered to occur. Set->

Monitoring node energy less than->

The energy void situation is considered to occur, where both α and β are constant values.

The optimization aims at maximizing the uplink data collection amount and the downlink energy transmission amount by jointly optimizing LEO flight decisions and resource allocation.

S301, based on the above, defining a multi-objective optimization problem is as follows:

P1:

in constraint conditions of the optimization problem, C1 and C2 are flight speed and deflection angle constraints of LEO, C3 is LEO transmitting power constraint, and C4 and C5 are constraints of a monitoring node data queue and an energy queue respectively.

The optimization target preference is described by introducing weight parameters by adopting a multi-target joint optimization MJDPG algorithm for low orbit satellite LEO assisted city monitoring network data acquisition and energy transmission.

In MJDDPG algorithm, the corresponding prize value is expressed as r=rw ^T The bonus vector is converted into scalar form. According to the importance preference of each optimization objective, in the interval [0.0,1.0]All weight parameters are selected, and the rest network structures are the same as the DDPG algorithm. For the target network, the target value y _t Is calculated as follows:

y _t ＝rw ^T +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' ) (28)。

s4 specifically comprises the following steps:

s401, defining a state space

In a low orbit satellite LEO assisted urban monitoring network, the state space is jointly determined by the monitoring nodes, LEO and environmental information, at time slot t,

is the relative distance between the target monitoring node and LEO in a Cartesian coordinate system, N _f (t) recording the cumulative number of times the LEO task period exceeded the limit region by time t. The absolute position of LEO is [ x ] _u (t),y _u (t)]Number of data loss nodes N _d (t) and number of Power-off nodes N _e (t) will prompt LEO to timely serve the high demand monitoring node, the amount of data to be uploaded in the current target node +.>

The LEO is directed to make efficient decisions on the allocation of resources for time slots and power. The definition of the state space is as follows:

wherein at time t=0, N _f (t)、N _d (t) and N _e (t) are all 0.

S402, defining an action space

Mapping the state space to a continuous action space, and realizing multi-objective optimization by jointly optimizing LEO flight decision, time slot allocation and power allocation in LEO-assisted city monitoring network scene. Based onThe actions selected by LEO at time slot t include LEO flight speed v (t), flight angle θ (t), and time slot allocation τ (t) and transmit power allocation p _d (t) all the action variables are continuous variables. Thus, the actions that LEO can take as an agent in time slot t can be expressed as:

wherein the flying speed v (t) and the yaw angle θ (t) are respectively in the intervals [0, v ] _max ]And [ -pi, pi]Within the range, pass [ cos (θ (t))]Representing yaw, p _d (t) is in the range of P _d ∈[0,P _max ]τ (t) represents the proportion of time allocated to downlink energy transmission in a single time slot.

S403, establishing a reward function

The optimization aims at optimizing the network performance by maximizing the data acquisition amount, the energy transmission amount and minimizing the LEO energy consumption on the premise of guaranteeing the network quality. The rewards are thus defined as multidimensional vectors:

wherein r is _dc (t)、r _eh (t) and r _ec (t) is an optimization objective, r _aux And (t) is a penalty term.

According to the service condition of LEO to the node in time slot t, the rewarding value is expressed as:

wherein D is _k (t) represents the reward value corresponding to the total data volume collected by LEO in the coverage service stage at the monitoring node k, and the larger the total data volume is, the larger the reward value is obtained; e (E) _k (t) represents the rewards value brought by the LEO covering the total energy transmission amount in the service stage at the monitoring node k, and the larger the total energy transmission amount is, the larger the rewards value is obtained;

indicating the energy consumption of LEO in time slot t, if LEO is in flight +.>

If in the overlay service state, the overlay service energy consumption and communication energy consumption are included, namely +.>

Once the target monitoring node falls within the coverage radius of the LEO, the LEO will cover the service and perform data acquisition and energy transfer. Otherwise, LEO is in flight phase r _dc (t) and r _eh (t) are all 0. Furthermore for auxiliary penalty function r _aux (t)：

r _aux The first two terms in (t) are the distance between the LEO and the target monitoring node. LEO is motivated to preserve network quality during learning by punishing the wrong flight decisions of LEO.

Drawings

FIG. 1 is a model building diagram of S1;

FIG. 2 is a flowchart of a cluster head election algorithm of S2;

FIG. 3 is a flowchart of the low orbit satellite LEO assisted urban monitoring network resource allocation strategy formulation of S3;

FIG. 4 is a Markov decision process problem model of S4;

fig. 5 is a DDPG algorithm architecture diagram.

Detailed Description

The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.

S1 specifically comprises as shown in FIG. 1:

Specifically, the scene is that a single LEO provides energy transmission and data acquisition service for multiple monitoring nodes through mobile deployment. The LEO is provided with a single antenna, the nodes are provided with multiple antennas, information decoding and energy collection are respectively carried out based on an antenna switching structure, and the monitoring nodes can be different in terms of data generation rate, distribution density, energy consumption rate and the like according to different monitoring tasks.

Considering that LEO has limited energy, each time the duration of a flight task is T > 0, for ease of analysis, the total time is divided into equal length time slots, i.e., t=1, 2. And the monitoring node receives the radio frequency signal based on the antenna switching structure, decodes the information and simultaneously collects energy, and the cluster head node uploads the monitoring data to the LEO in an uplink sub-time slot.

S102, constructing a transmission queue model

In this scenario, the monitoring node is used to

The node position is represented as [ x ] _m ,y _m ]. For monitoring node->

Setting lambda _m (t) represents the data generation rate of node m during the execution of the monitoring task in time slot t, where different monitoring nodes will be at lambda due to deployment location and hardware factors _m (t) there is a difference, given lambda of different nodes _m (t) obeys a poisson distribution and the parameter is constant during the monitoring task, i.e. lambda _m (t)＝λ _m . Set->

Can be expressed as:

wherein the method comprises the steps of

Same, when->

Exceed->

For energy transmission requirements, set

Can be expressed as:

Wherein the method comprises the steps of

The same applies. When->

S103, constructing a system channel model

Considering that the research scene is an urban area with more building shielding, the free space propagation channel model is no longer applicable. Therefore, probability channel models are built by comprehensively considering the occurrence probability of Line of Sight (LOS) and Non-Line of Sight (NLOS) channels, and are used as communication models of LEOs and ground monitoring nodes, under the models, corresponding losses can be considered, research scenes are urban areas with more building shielding, and the free space propagation channel models are not applicable any more. The probability channel model is thus built by comprehensively considering the probability of occurrence of Line of Sight (LOS) and Non-Line of Sight (NLOS) channels, and serves as a communication model for LEO and ground monitoring nodes, where the corresponding LOSs can be expressed as:

in gamma ₀ ＝(4πf _c /c) ^-2 Representing the reference distance d ₀ Channel power gain at=1m, f _c Representing the carrier frequency, c representing the speed of light; d, d _m (t) LEO and target node m The distance between the two electrodes is equal to the distance between the two electrodes,

For monitoring node m, the LOS probability at time t is:

θ _m (t)＝(180/π)sin ^-1 (H/d _m (t)) (5)

non-line-of-sight link probability may be determined by P _t ^NLOS (θ _m (t))＝1-P _t ^LOS (θ _m (t)) to represent. It is assumed that the uplink and downlink channels are approximately the same. The downlink channel power gain and the uplink channel power gain of the communication link between LEO and target monitoring node m may be denoted as h, respectively _m (t) and g _m (t). I.e. the channel power gain between LEO and target node can be expressed as:

s104, constructing a system energy consumption model

Assuming LEO flies at a fixed height H > 0, the horizontal position at time slot t is denoted as [ x ] _u (t),y _u (t)]The position change in the LEO vertical direction is not taken into account in this scenario. The LEO determines its next action in real time during the movement and updates its location. The flight control of LEO in this scenario is described by a flight speed v (t) limited by the maximum flight speed and a yaw angle θ (t) limited by θ (t) ∈ [ -pi, pi []Is limited by the number of (a). Here, the energy consumption model study on LEO will jointly consider flight energy consumption, coverage service energy consumption and communication energy consumption, wherein the flight The power consumption of LEO propulsion at speed V during a row can be calculated by:

p in the formula ₀ Is blade profile power at overlay service, U _tip Is the tip speed of the rotor blade. P (P) _i And v ₀ The induction power and average rotor induction speed under the overlay service conditions are shown. For parasitic power, d ₀ ρ, s, A represent the fuselage resistance ratio, air density, rotor solidity and rotor disk area, respectively. The propulsion power consumption of the LEO includes the blade profile, the inductive power and the parasitic power, corresponding to the three parts of equation (4-7). The power consumption for the overlay service can be obtained by setting v=0:

P _hov ＝P(V＝0)＝P ₀ +P _i (8)

the flight expended energy of LEO in time slot t can therefore be expressed as:

the energy transmission loss in the process of charging the monitoring node by LEO is mainly considered for communication energy consumption. In an actual scene, the LEO has a limited energy transmission range, and only the monitoring nodes in the LEO coverage range are subjected to energy transmission and data acquisition in the coverage service stage.

S105, constructing an energy transmission model

Is limited by the number of (a). Within τ (t), all monitoring nodes within the LEO energy transmission coverage will get charged, and the received power at monitoring node m can be expressed as:

To more closely approximate a real scene, a nonlinear energy transfer model is applied here as the air-to-ground energy transfer model. Compared with a linear model, the nonlinear energy transmission model considers the saturation limit of a circuit, and has more generality and practicability. By the RF-EH model, the actual power at the receiving end can be expressed as:

S2 is shown in FIG. 2, and is a flow design diagram of a cluster head election algorithm based on K-Means;

the number of monitoring nodes in the urban monitoring network scene is huge and the monitoring nodes are densely distributed, the LEO sequentially traverses all the monitoring nodes to perform energy transmission and data acquisition, so that serious LEO energy consumption is caused, and in addition, if the monitoring nodes cannot obtain timely data acquisition and energy transmission service, serious energy holes and data loss are caused. Therefore, the problem is divided into two parts of cluster head election and resource allocation, after the nodes are clustered, a proper monitoring node is selected from each cluster to serve as a cluster head, and the cluster head nodes collect data of monitoring nodes in the clusters and forward the data to LEO. The K-Means algorithm is selected to cluster all the monitoring nodes as shown in FIG. 2:

the specific algorithm is shown in the following table:

S201, distance formula design of K-Means algorithm

For the distance formula design of the K-Means algorithm, the similarity of adjacent monitoring nodes in the data generation rate and the energy consumption rate is considered, and the joint monitoring node characteristics and Euclidean distance are taken as joint distances:

Wherein the intra-cluster node transmits self monitoring data to the cluster head node, LEO transmits energy to all nodes in the cluster in the coverage service stage, the monitoring node serving as the cluster head uploads the monitoring data through an uplink in the time of 1-tau (t), and the uplink transmitting power of the cluster head node is +.>

Depending on the total energy collected during τ (t), i.e. +.>

Is positively related to->

Can be expressed as:

Thus, the upload data rate for cluster head node k is expressed as:

at each time slot, the LEO selects one cluster head node as the target node for the next service,if the target node is still the current node, the LEO of the next time slot continues to maintain the coverage service state, and if the target node is changed, the LEO is in a flight state and moves to the position of the target node through decision of the flight speed and the deflection angle. For the selection of the target node, the service priority of the node needs to be considered, wherein the service priority comprises a data acquisition priority, an energy supply priority and a node distance, the data acquisition priority is set based on the data queue length and the data generation rate of the monitoring node, and the energy supply priority is the same. The service priority of cluster head node k at time slot t can be defined as Q _k (t)：

S3, as shown in FIG. 3, a flow chart is formulated for the low orbit satellite LEO assisted city monitoring network resource allocation strategy.

Because the research target is that a dynamic resource allocation strategy based on an antenna switching structure is found in the LEO-assisted urban monitoring network scene, the strategy is required to comprehensively consider three aspects of data acquisition requirements, energy transmission requirements and LEO energy consumption to optimize network performance. The analysis is as follows, wherein the data acquisition requirement is mainly represented by the total amount of data acquisition of the LEO on the monitoring node, the energy transmission requirement is the total amount of energy transmitted by the LEO, and the LEO energy consumption is the total amount of communication energy consumption, flight energy consumption and coverage service energy consumption of the LEO in one task period. Therefore, in the research, the aim of optimizing the total data acquisition amount and the total energy transmission amount in the urban monitoring network and the minimum LEO energy consumption is multiple. In the decision of LEO flight path selection and coverage service position, the state of the monitoring node and LEO energy consumption are considered, and the overflow of the monitoring node data and energy holes are avoided as much as possible. And setting the LEO to sequentially access the monitoring nodes for service according to the real-time service priority of the nodes, namely selecting the monitoring nodes as target nodes of the LEO in a time slot t based on a formula (15).

(1) Data acquisition amount

After the monitoring nodes are clustered, the monitoring nodes in the clusters send the cache data in the data transmission queue to the cluster head nodes in a single-hop or multi-hop mode, so that the LEO in the time slot t is realized based on the cluster head nodes during the coverage service of the monitoring node k, and the corresponding data acquisition quantity can be expressed as:

D _k (t)＝R _k (t)(1-τ(t)) (16)

the total data acquisition of the LEO to the monitoring node in the task period T can be expressed as:

(2) Energy transmission quantity

During the time slot t, the LEO sends data acquisition information to the cluster head node k and the rest monitoring nodes in the coverage area of the LEO and transmits energy, and during the coverage service of the cluster head node k, the energy transmission amount of the LEO in the time slot t can be expressed as:

the total amount of energy transfer of the LEO during the mission period T can thus be expressed as:

(3) Low orbit satellite LEO energy consumption

LEO energy consumption can be classified into flight energy consumption and coverage service energy consumption according to the state of LEO in time slot t, wherein energy consumption during coverage service includes LEO coverage service energy consumption and downlink transmission total energy, and can be expressed as:

/>

LEO energy consumption meter if in flight stateShown as E ^uav (t) =p (v) t, so the energy consumption of LEO in a task cycle can be expressed as:

meanwhile, considering the actual scene, in order to minimize the data overflow and energy cavity conditions, setting

I.e. the data volume in the monitoring node is larger than the threshold +.>

Data overflow is considered to occur. Similarly, set +.>

Monitoring node energy less than->

The optimization aims to maximize the uplink data collection amount and the downlink energy transmission amount by jointly optimizing LEO flight decision and resource allocation, and reduce LEO energy consumption as much as possible.

P1:

in constraint conditions of the optimization problem, C1 and C2 are flight speed and deflection angle constraints of LEO, C3 is LEO transmitting power constraint, and C4 and C5 are constraints of a monitoring node data queue and an energy queue respectively. In the optimization objective, the maximization of the data collection amount mainly depends on the allocation of the LEO to the time slots and the power during the coverage service of the target monitoring node position, that is, the uploading data amount of the current target monitoring node can be improved by optimizing the time slots and the power. However, there is an increase in LEO power consumption when the resource allocation is excessive. Maximizing the total amount of energy transmission is also dependent on the allocation of slots and power by the LEO during the overlay service, and by allocating more slots and power, the monitoring node can be provided with sufficient energy, but more LEO energy consumption will result.

Based on the above analysis, it is not difficult to find that three optimization objectives have a conflict to some extent. How to find the best coverage service position to make a flight decision and optimize a resource allocation decision, the process is very complex, and considerable calculation cost is brought. Furthermore, conventional model-based methods such as dynamic planning methods are not effective in solving this problem, since the environment is partially observable. Considering the dense distribution of monitoring nodes in this scenario, DQN algorithm is not applicable to continuous motion space, while DDPG has been proven as classical DRL algorithm to learn effective strategies in continuous motion space through low-dimensional observation. The algorithm is suitable for LEO flight decision problems, takes scalar values as rewards in an original DDPG algorithm into consideration, extends to multidimensional rewards according to Multi-objective optimization problems, and provides a Multi-objective Joint DDPG (MJDPG) algorithm for low orbit satellite LEO assisted city monitoring network data acquisition and energy transmission, and the optimization objective preference is described by introducing weight parameters.

Fig. 5 is a diagram of a DDPG algorithm architecture, in which MJDDPG is a single target MDP with scalar prize signals, and the prizes in the experience tuples are vectors, unlike the original DDPG. Since the value of the action depends on the preference between competing goals, a linear weighting method is used here to calculate a weighted sum of the prize value vector elements, where the corresponding prize value is denoted r=rw ^T The bonus vector is converted to scalar form during a particular calculation.

In the arrangement herein, in the interval [0.0,1.0, according to the importance preference of each optimization objective]All weight parameters are selected, and the rest network structures are the same as the DDPG algorithm. For the target network, the target value y _t Is calculated as follows:

y _t ＝rw ^T +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' ) (28)

the complete MJDDPG algorithm is shown in algorithm 2.

/>

S4, a Markov decision process problem model is shown in FIG. 4.

S401, defining a state space

the relative distance between the target monitoring node and LEO in the Cartesian coordinate system is that after LEO completes the service of the current target node, a new monitoring node is selected according to the current state of the system to be usedIs the target node, and thus this section helps direct the LEO to bring the target monitoring node into its coverage. N (N) _f (t) recording the cumulative number of times the LEO task period exceeded the limit region by time t. Absolute position of bound LEO [ x _u (t),y _u (t)]The LEO can be prevented from flying out of the designated area, and unnecessary resource waste is caused. Number of data loss nodes N _d (t) and number of Power-off nodes N _e (t) will prompt LEO to timely serve the high demand monitoring node, the amount of data to be uploaded in the current target node +.>

The LEO is directed to make efficient decisions on the allocation of resources for time slots and power. Thus, the definition of the state space is as follows:

wherein at time t=0, N _f (t)、N _d (t) and N _e (t) are all 0.

S402, defining an action space

In a research scenario, the state space needs to be mapped to a continuous action space, and in a LEO-assisted city monitoring network scenario, multi-objective optimization is achieved by jointly optimizing LEO flight decisions, time slot allocation and power allocation. Based on the current system state and environment, the actions selected by the LEO at time slot t include the flight speed v (t), flight angle θ (t), and time slot allocation τ (t) and transmit power allocation p of the LEO _d (t) all the action variables are continuous variables. Thus, the actions that LEO can take as an agent in time slot t can be expressed as:

S403, establishing a reward function

The reward function is used as quantitative evaluation after the agent takes action in reinforcement learning, the proper reward function is particularly important to the performance of the deep reinforcement learning algorithm, and LEO learns the control strategy by well-designed reward function. The optimization aims at optimizing the network performance by maximizing the data acquisition amount, the energy transmission amount and minimizing the LEO energy consumption on the premise of guaranteeing the network quality. The rewards can thus be defined as multidimensional vectors:

Depending on how the LEO is serving the node at time slot t, the prize value may be expressed as:

It can be seen that r _aux The first two terms in (t) are the distance between the LEO and the target monitoring node. If the LEO is farther from the target monitoring node, the penalty term will be smaller, helping the LEO to identify the location of the target monitoring node in order to be close to the target node. In addition, if the LEO attempts to fly out of the area or due to untimely service, a negative reward will be obtained if the monitoring node data overflows or the energy is exhausted. By penalizing LEO erroneous flight decisions to promote LEO preserving network quality during learning, r in the simulation environment herein _aux (t) corresponding weight w _aux Always set to 1.

Claims

1. The city monitoring-oriented satellite energy-carrying Internet of things resource optimization allocation method is characterized by comprising the following steps of:

s1, constructing a low orbit satellite LEO auxiliary city monitoring network model, wherein a specific scene is that single low orbit satellite LEO provides energy transmission and data acquisition service for multiple monitoring nodes through mobile deployment; the LEO is provided with a single antenna, the node is provided with a plurality of antennas, and information decoding and energy collection are respectively carried out based on an antenna switching structure;

constructing a transmission queue model of the system;

establishing a probability channel model by comprehensively considering the occurrence probability of the sight link and the non-sight link channels, and taking the probability channel model as a channel model of LEO and a ground monitoring node;

the LEO determines the next action in real time in the moving process, updates the position of the LEO, and jointly considers the flight energy consumption, the coverage service energy consumption and the communication energy consumption to construct an energy consumption model of the system;

LEO is used for serving the decision of the flying speed and the deflection angle to move to the position of a target node, and a radio frequency signal is sent to a monitoring node at a specific transmitting power in a sub-time slot; in the sub time slot, all monitoring nodes in the satellite energy transmission coverage range are charged, and accordingly an energy transmission model of the system is built;

S1 specifically comprises the following steps:

The scene is that a single LEO provides energy transmission and data acquisition service for a plurality of monitoring nodes through mobile deployment; the LEO is provided with a single antenna, the node is provided with a plurality of antennas, and information decoding and energy collection are respectively carried out based on an antenna switching structure;

each time the duration of a flight task is T > 0, dividing the total time into equal-length time slots, i.e., t=1, 2, & gt, T, adopting a flight-coverage service communication protocol in the LEO work, wherein the LEO does not communicate with the monitoring nodes during flight, only performs energy transmission and data acquisition on the monitoring nodes during the coverage service, the coverage service time slot is divided into two parts which respectively correspond to uplink and downlink communication, and the LEO transmits information to the monitoring nodes in the cluster in the downlink sub time slot and simultaneously performs energy transmission; the monitoring node receives the radio frequency signal based on the antenna switching structure to perform information decoding and simultaneously perform energy collection, and the cluster head node uploads monitoring data to the LEO in an uplink sub-time slot;

s102, constructing a transmission queue model

In this scenario, the monitoring node is used to

The node position is represented as [ x ] _m ,y _m ]The method comprises the steps of carrying out a first treatment on the surface of the For monitoring nodes

Setting lambda _m (t) represents the data generation rate of node m during the execution of the monitoring task at time slot t; assume thatLambda of different nodes _m (t) obeys a poisson distribution and the parameter is constant during the monitoring task, i.e. lambda _m (t)＝λ _m The method comprises the steps of carrying out a first treatment on the surface of the Set->

Expressed as:

wherein the method comprises the steps of

Is the maximum capacity of the data transmission queue storage, assuming +.>

Same, when->

Exceed->

When the data is stored in the node data buffer area, the newly collected data cannot be put into the node data buffer area to be discarded, so that data overflow is caused;

for energy transmission requirements, set

Representing the residual energy of the monitoring node m at the time slot t, and setting mu _m (t) represents the energy consumption rate of the node in time slot t, mu of different time slots _m (t) is the same, i.e. μ _m (t)＝μ _m Also because of hardwareDifferent factors and deployment locations, mu of monitoring nodes _m Different; +.>

Expressed as:

wherein the method comprises the steps of

Is the maximum capacity of the energy transfer queue storage, assuming +.>

The same; when->

When the monitoring node is exhausted, normal service cannot be provided, and an energy cavity condition occurs;

s103, constructing a system channel model

representing a path loss index; mu (mu) ^NLOS Is the attenuation coefficient of the NLOS link;

for monitoring node m, the LOS probability at time t is:

non-line-of-sight link probability passing

To represent; the downlink channel power gain and the uplink channel power gain of the communication link between LEO and target monitoring node m are denoted as h, respectively _m (t) and g _m (t); i.e. the channel power gain between LEO and target node is expressed as:

s104, constructing a system energy consumption model

Assuming LEO flies at a fixed height H > 0, the horizontal position at time slot t is denoted as [ x ] _u (t),y _u (t)]The LEO determines the next action in real time in the moving process and updates the position of the LEO; the flight control of LEO in this scenario is described by a flight speed v (t) limited by the maximum flight speed and a yaw angle θ (t) limited by θ (t) ∈ [ -pi, pi [ ]Is limited by (a); here, the energy consumption model study on LEO will jointly consider flight energy consumption, coverage service energy consumption and communication energy consumption, wherein the propulsion power consumption of LEO at speed V during flight is calculated by the following formula:

p in the formula ₀ Is blade profile power at overlay service, U _tip Is the tip speed of the rotor blade; p (P) _i And v ₀ Representing induction power and average rotor induction speed under overlay service conditions; for parasitic power, d ₀ ρ, s, A represent the fuselage resistance ratio, air density, rotor solidity and rotor disk area, respectively; the propulsion power consumption of LEO includes blade profile, inductive power and parasitic power, corresponding to the three parts of equation (7); the power consumption is obtained by setting v=0 for the overlay service:

P _hov ＝P(V＝0)＝P ₀ +P _i (8)

the flight expended energy of the LEO in time slot t is expressed as:

in the coverage service stage, carrying out energy transmission and data acquisition on the monitoring nodes in the LEO coverage area;

s105, constructing an energy transmission model

Is limited by (a); within τ (t), all monitoring nodes within the LEO energy transmission coverage will get charged, and the received power at monitoring node m is expressed as:

Applying a nonlinear energy transmission model as an air-ground energy transmission model; by the RF-EH model, the actual power of the receiving end is expressed as:

wherein P is _limit The maximum output direct current power, c and d are circuit characteristic correlation constants;

s2, dividing all M monitoring nodes into K clusters by using a K-Means algorithm, selecting a proper monitoring node from each cluster as a cluster head, collecting data of monitoring nodes in the clusters by the cluster head nodes, and forwarding the data to LEO; LEO transmits energy to all nodes in the cluster in the coverage service stage; in each time slot, LEO selects a cluster head node as a target node of the next service; the service priority of the node is considered for the selection of the target node;

s2 specifically comprises the following steps:

s201, distance formula design of K-Means algorithm

a and b are the length and the width of the city monitoring network model respectively;

Depending on the total energy collected during τ (t), i.e. +.>

Is positively related to->

Expressed as:

wherein ζ represents energy conversion efficiency, which is a constant value;

the upload data rate for cluster head node k is expressed as:

in each time slot, the LEO selects a cluster head node as a target node of the next service, if the target node is still the current node, the LEO of the next time slot continuously maintains the coverage service state, if the target node is changed, the LEO is in a flight state, and the LEO moves to the position of the target node through decision of the flight speed and the deflection angle; the method comprises the steps that service priorities of nodes are required to be considered for selecting target nodes, the service priorities comprise data acquisition priorities, energy supply priorities and node distances, the data acquisition priorities are set based on the data queue length and the data generation rate of monitoring nodes, and the energy supply priorities are the same; the service priority of cluster head node k in time slot t is defined as Q _k (t)：

Wherein the data acquisition priority and the energy supply priority weights are α=1 and β=5, respectively;

s3, comprehensively considering three aspects of data acquisition requirements, energy transmission requirements and low orbit satellite LEO energy consumption, realizing maximization of uplink data collection quantity and downlink energy transmission quantity through combined optimization of LEO flight decision and resource allocation, defining a multi-objective optimization problem, and optimizing;

S3 specifically comprises the following steps:

(1) Data acquisition amount

D _k (t)＝R _k (t)(1-τ(t)) (16)

(2) Energy transmission quantity

(3) Low orbit satellite LEO energy consumption

if in flight, LEO energy consumption is denoted as E ^uav (t) =p (v) t, the energy consumption of LEO in a task cycle is expressed as:

setting up

I.e. the data volume in the monitoring node is larger than the threshold +.>

When the data overflow condition occurs, the data overflow condition is considered to occur; setting up

Monitoring node energy less than->

The situation of energy cavity is considered to occur, wherein the data acquisition priority and the energy supply priority weight are respectively alpha=1 and beta=5;

the optimization aims at maximizing the uplink data collection amount and the downlink energy transmission amount by jointly optimizing LEO flight decisions and resource allocation;

in constraint conditions of the optimization problem, C1 and C2 are flight speed and deflection angle constraints of LEO, C3 is LEO transmitting power constraints, and C4 and C5 are constraints of a monitoring node data queue and an energy queue respectively;

adopting a multi-target joint optimization MJDPG algorithm for low orbit satellite LEO assisted urban monitoring network data acquisition and energy transmission, and describing optimization target preference by introducing weight parameters;

in MJDDPG algorithm, the corresponding prize value is expressed as r=rw ^T Converting the bonus vector into a scalar form; according to the importance preference of each optimization objective, in the interval [0.0,1.0]Selecting all weight parameters, and the other network structures are the same as a DDPG algorithm; for the target network, the target value y _t Is calculated as follows:

y _t ＝rw ^T +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' ) (28)；

s4, constructing a problem model according to a Markov decision process; the state space of the system is described first; then constructing an action space description based on the system state and environment in the low orbit satellite LEO auxiliary city monitoring network research scene, and actions selected by the LEO in a specific time slot, including the flying speed, the flying angle, the time slot allocation and the transmitting power allocation of the LEO; meanwhile, the method is used as quantitative evaluation after the intelligent agent takes action in reinforcement learning;

S4 specifically comprises the following steps:

s401, defining a state space

is the relative distance between the target monitoring node and LEO in a Cartesian coordinate system, N _f (t) recording the accumulated number of times the LEO task period exceeds the limit area when the time t is recorded; the absolute position of LEO is [ x ] _u (t),y _u (t)]Number of data loss nodes N _d (t) and number of Power-off nodes N _e (t) will prompt LEO to timely serve the high demand monitoring node, the amount of data to be uploaded in the current target node +.>

Guiding LEO to make effective decision on time slot and power resource allocation; the definition of the state space is as follows:

wherein at time t=0, N _f (t)、N _d (t) and N _e (t) are all 0;

s402, defining an action space

Mapping the state space to a continuous action space, and realizing multi-objective optimization by jointly optimizing LEO flight decision, time slot allocation and power allocation in LEO-aided city monitoring network scenes; based on the current system state and environment, the actions selected by the LEO at time slot t include the flight speed v (t), flight angle θ (t), and time slot allocation τ (t) and transmit power allocation p of the LEO _d (t) all the action variables are continuous variables; the actions that LEO may take as an agent in time slot t may be expressed as:

wherein the flying speed v (t) and the yaw angle θ (t) are respectively in the intervals [0, v ] _max ]And [ -pi, pi]Within the range, pass [ cos (θ (t))]Representing yaw, p _d (t) is in the range of P _d ∈[0,P _max ]τ (t) represents the proportion of time allocated to downlink energy transmission in a single time slot;

s403, establishing a reward function

The optimization aim is to optimize the network performance by maximizing the data acquisition amount, the energy transmission amount and minimizing the LEO energy consumption on the premise of guaranteeing the network quality; rewards are defined as multidimensional vectors:

wherein r is _dc (t)、r _eh (t) and r _ec (t) is an optimization objective, r _aux (t) is a penalty term;

If in the coverage service state, the coverage service energy consumption and the communication energy consumption are included, namely

Once the target monitoring node falls within the coverage radius of the LEO, the LEO will cover the service and perform data acquisition and energy transmission; otherwise, LEO is in flight phase r _dc (t) and r _eh (t) are all 0; furthermore for auxiliary penalty function r _aux (t)：

r _aux The first two terms in (t) are the relative distances between the target monitoring node and the LEO in a cartesian coordinate system; LEO is motivated to preserve network quality during learning by punishing the wrong flight decisions of LEO.