CN113365222A

CN113365222A - Mobile sensor intelligent track design method based on sustainable data acquisition

Info

Publication number: CN113365222A
Application number: CN202110916516.0A
Authority: CN
Inventors: 贾日恒; 张秀铃; 林飞龙; 郑忠龙
Original assignee: Zhejiang Normal University CJNU
Current assignee: Zhejiang Normal University CJNU
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-09-07
Anticipated expiration: 2041-08-11
Also published as: CN113365222B

Abstract

The invention discloses a mobile sensor intelligent track design method based on sustainable data acquisition, which comprises the following steps: s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission; s2, constructing a model between energy collection and data transmission based on Markov decision; and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy. Aiming at the aim of maximizing the long-term average data collection throughput, the algorithm is designed so that the sensor can intelligently track the optimal arrival point of each time slot in the whole two-dimensional space area under the condition of unknown environmental energy, and then the collection of the maximized energy and the data transmission are carried out in the time slot.

Description

Mobile sensor intelligent track design method based on sustainable data acquisition

Technical Field

The invention relates to the technical field of wireless information transmission, in particular to a mobile sensor intelligent track design method based on sustainable data acquisition.

Background

A Wireless Sensor Network (WSN) is a distributed sensing network whose distal end is a Sensor that can sense and inspect the outside world. The sensors in the WSN communicate in a wireless mode, so that the network setting is flexible, the position of equipment can be changed at any time, and the equipment can be connected with the Internet in a wired or wireless mode. Each node in the network has induction and is often used in scenes such as ecological environment monitoring, intelligent security patrol, forest temperature and humidity data acquisition and the like, but the data acquired by each node of the sensor network usually faces a lot of challenges in the data transmission process, because the capacity of the sensor node is limited and the area where the whole network is deployed is generally complex and not beneficial to the data transmission of the node. Therefore, the mobile sensor intelligent track design method based on sustainable data acquisition mainly puts a mobile data acquisition device into an area where a wireless sensor network is deployed to collect data acquired by scattered node sensors in the sensor network. Considering that the mobile sensor consumes energy for moving and data transmission, and frequent battery replacement in complex applications is not practical, the energy collection (EH) technology enables Wireless Sensor Networks (WSNs) to develop continuously by itself to maintain long-term key performance indexes such as data throughput and transmission coverage capability. Therefore, the wireless charging technology is introduced to obtain energy from the surrounding environment to supply energy to the mobile sensor, the mobile sensor directly obtains energy from the surrounding environment to realize self-supply, and the wireless sensor network can operate permanently, so that a replaceable battery or a fixed power grid does not need to be equipped, and the utilization rate of the wireless sensor equipment deployed on a large scale is greatly improved. In practical applications, the environmental energy obtained by the wireless sensor is usually unknown, and the corresponding energy collection process has randomness and dynamic characteristics, and these uncertainties can affect long-term key performances such as data throughput, sensing coverage and data transmission of the sensor network. There is therefore a need for efficient learning algorithms that enable mobile sensors to adapt to the goals of sustainable data collection.

Aiming at the problems, the invention provides a mobile sensor intelligent track design method based on sustainable data acquisition, which solves the technical problems.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a mobile sensor intelligent track design method based on sustainable data acquisition, aiming at the aim of maximizing the long-term average data collection throughput, a design algorithm enables a sensor to intelligently track an optimal arrival point under each time slot in the whole two-dimensional space area under the condition of unknown environmental energy, and then the collection of the maximized energy and the data transmission are carried out in the time slot.

In order to achieve the purpose, the invention adopts the following technical scheme:

a mobile sensor intelligent track design method based on sustainable data acquisition comprises the following steps:

s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission;

s2, constructing a model between energy collection and data transmission based on Markov decision;

and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy.

Further, the step S1 of collecting energy from the surrounding environment by the mobile sensor specifically includes: the time is divided into equally spaced time slots and the mobile sensors collect the energy of the mobile sensors in each time slot.

Further, the models constructed in step S2 are T slot models; each time slot model of the T time slot models comprises a data transmission model, an energy collection model and a mobility model.

Further, the data transmission model is represented as:

wherein, J_tRepresenting a data transmission model; p is a radical of_tRepresenting transmission data; l_tRepresents a motion sensor; l_sRepresenting a data receiver; l_t-l_sL represents the distance of the moving sensor to the data receiver; α represents a distance loss factor.

Further, the energy harvesting model is represented as:

wherein the content of the first and second substances,

represents the amount of energy harvested from energy source E1 by the mobile sensor at time t;

represents the amount of energy harvested from energy source E2 by the mobile sensor at time t;

represents the coordinate position of energy source E1;

represents the coordinate position of energy source E2; beta represents an upper bound that limits the mobile sensor to harvest energy;

representing the total energy value that can be harvested from the two-terminal energy source at time t by moving the sensor at position i.

Further, the mobility model comprises a mobility consumption model and a location update model;

the mobile consumption model is represented as:

wherein, C_tRepresents the movement consumption of the movement sensor; rho_tRepresenting the speed at which the motion sensor is moving;

an energy consumption value representing a unit speed;

the location update model is represented as:

wherein x is_t+1、y_t+1Representing the coordinates of the mobile sensor in a two-dimensional coordinate system at the next moment t +1 by taking t as a reference; x is the number of_t、y_tPosition coordinates representing the mobile sensor at time t; phi_tIndicates the direction at time t;

indicating the direction of movement of the movement sensor.

Further, each of the timeslot models is represented as:

wherein, B_t+1Representing each slot model; b is_maxRepresents the maximum battery capacity of the mobile sensor; b is_tIndicating the remaining charge of the motion sensor.

Further, the T slot models are represented as:

and (3) constraint:

wherein l_t、l_t+1Respectively representing the positions of the mobile sensor at the time t and the time t + 1; p is a radical of_t+1Representing the energy used by the motion sensor for data transmission at time t + 1.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method has good generalization performance, and can self-locate to a better collection point for energy collection and data transmission even in an unknown area by using a trained parameter model;

(2) two layers of fully-connected deep neural network approximation value functions are used, and an operator-critic algorithm is adopted to solve the strategy optimization problem of the continuous action space;

(3) exploration without knowing the energy distribution and any information on the energy: the mobile sensor is intelligently searched for an optimal track to reach a theoretical optimal position for energy collection and data transmission at different initial positions of the mobile sensor and different positions of a data receiver (sink), and the high moving cost of a unit distance possibly prevents the mobile sensor from identifying the optimal moving track and possibly leads to suboptimal solution;

(4) the model is continuously moved to adapt to the real physical environment with momentum, where the speed of movement and the angle of rotation can take any value within a certain range. Accordingly, a time-dependent exploration strategy is provided to adapt to the physical control process of the mobile sensor with inertia, so that the exploration efficiency in the training process is improved;

(5) the method has important technical significance for realizing the self-continuity of the wireless sensor network based on energy harvesting and solving the problem that environmental energy has randomness and unpredictability, and has important values for large-scale deployment and use of the wireless sensor network, maximum utilization of environmental resources and reduction of deployment cost.

Drawings

FIG. 1 is a flowchart of a mobile sensor intelligent trajectory design method based on sustainable data collection according to an embodiment;

FIG. 2 is a schematic diagram of a mobile model of a mobile sensor for collecting energy and transmitting data according to an embodiment;

FIG. 3 is a system interaction model diagram of an actor-critic-based reinforcement learning framework provided by an embodiment;

FIG. 4 is a diagram of dividing each time slot t into three sub-time slots according to the first embodiment;

FIG. 5 is a schematic diagram of an algorithm provided in accordance with one embodiment;

fig. 6 is a schematic diagram of a track of the motion sensor provided in the second embodiment.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

The invention aims to provide a mobile sensor intelligent track design method based on sustainable data acquisition, aiming at the defects of the prior art.

Example one

The embodiment provides a mobile sensor intelligent track design method based on sustainable data acquisition, as shown in fig. 1, including the steps of:

In step S1, the motion sensor collects energy from the surrounding environment and uses the collected energy for self-motion and data transmission.

The mobile sensor in the embodiment is used for collecting energy around the environment, using the collected energy for self movement, collecting peripheral data such as temperature information obtained by the temperature sensor in the movement process, and transmitting the collected data to the data receiver; wherein the energy of the mobile sensor is also consumed during the transmission, and therefore the energy collected by the mobile sensor is also used for data transmission.

The present embodiment considers the throughput maximization problem of energy harvesting based wireless sensor networks with only random system information. In particular, consider a mobile sensor whose primary function is to move through the network to collect data and transmit the collected data to a data receiver (sink). The motion sensor may be self-supplementing by harvesting energy from the surrounding environment. As shown in fig. 2, assuming that the surrounding environment has two energy sources, i.e., an energy source 1 and an energy source 2 (the energy source of the present embodiment is not limited to two, and there may be a plurality of energy sources), the mobile sensor may obtain energy from the

energy sources

1 and 2. The energy obtained by the mobile sensor is specifically as follows: the time is divided into equally spaced time slots, and the energy in each time slot is acquired.

In this embodiment, the amount of energy captured by the motion sensor is determined by the following factors:

energy sources: it is assumed that the

energy sources

1, 2 are independent energy sources and that a certain amount of energy is randomly generated at each time slot. Thus, even if the mobile sensor remains in the same location for a period of time, the energy received may be different, as the energy is typically affected by environmental changes.

Position of the mobile sensor: as the mobile sensor moves within the wireless sensor network, the location of the mobile sensor determines how much energy is available from

energy sources

1, 2, for example: the closer the mobile sensor is to the energy source 1, the more energy the mobile sensor can capture from the energy source 1, since the power at which the mobile sensor captures energy is generally inversely proportional to the distance between the energy transmitter and the data receiver.

Trade-off between energy harvesting and data transmission: for a mobile sensor, the closer to the energy source, the more energy is acquired, and the more energy is used for data transmission; however, this may result in a reduction in data transmission power as the distance from the mobile sensor to the data receiver may increase as it approaches the energy source.

In step S2, a model between energy collection and data transmission is constructed based on the markov decision.

In response to the above problem, the present embodiment first models the collected energy and transmitted data jointly using a markov decision process, since at the beginning of each time slot the motion sensor determines future power allocation and motion based on the current situation. Then, based on the markov decision formula, a depth-deterministic policy gradient algorithm is employed to identify an optimal joint trajectory movement and energy-capturing power allocation policy (i.e., a function that maps the current situation to the best decision per slot) to maximize the long-term average data throughput from the moving sensor to the data receiver.

The present embodiment represents the motion sensor as an intelligent agent, learning and optimizing decisions by continuous interaction with the environment, defining S and a as a state space and an action space, respectively, containing all possible states and actions. Defining a state transition probability P (S)_t+1|S_t，a_t) I.e. in state S_tLower execution action a_tA state transition to state S occurs_t+1R, the parameter y indicates that the discount factor represents the degree of importance to future rewards. In the present model, MDP M = { S, a, P, R, γ } equation is defined to simulate the interaction process of the mobile sensor, as shown in fig. 3.

State (S): defining the state of the system before the start of the time slot as s_t={B_t,x_t,y_t,

And the power consumption of the mobile sensor is higher than the power consumption of the mobile sensor, and the power consumption of the mobile sensor is higher than the power consumption of the mobile sensor. From Markov properties, state s_tAll historical state information prior to t is inherited and the motion sensor can make further motion and power allocation decisions based on this information.

Operation (A): an action is defined as a motion mode decision assignment at each position. The motion allocation decision here consists of two parts, determining the direction of motion of the motion sensor by selecting the angle of rotation

And moving speed ρ_t [0,ρ_max]These two parameters are used to make the movement decision of the time slot t. Furthermore, the motion sensor needs to determine the energy p used to transmit data to the data receiver_tThe number of the cells. Thus, the action taken by the mobile sensor at each time slot is represented as: a is_t={p_t,

,ρ_t}. At the beginning of each time slot, it is assumed that the energy remaining in the mobile sensor battery is all used for data transmission, i.e. p_t = B_tThus redefining the action as: a is_t={

,ρ_t}。

Reward (R): since the ultimate goal of this embodiment is to maximize the data throughput from the long-term average moving sensor to the data receiver, the reward is defined as the amount of data transmitted per slot. The present embodiment provides a reward r available to the mobile sensor in each time slot_tDefined as the ratio of the remaining power of the mobile sensor at the present moment to the distance between the mobile sensor and the data receiver, i.e. r_t= B_t/|l_t-l_s|^α. For the main reasons such asThe following: on the one hand, the value of data throughput of each time slot and the residual battery capacity B of the mobile sensor_tIf more energy can be collected and stored in advance, more energy can be allocated for data transmission subsequently; on the other hand, the data throughput value per time slot is related to the distance | l of the mobile sensor to the data receiver_t -l_sL is inversely proportional. Thus, the prize r_tThe higher the value, the more the mobile sensor tends to find a better location where more energy can be harvested, while the shorter the distance between the mobile sensor and the data receiver. It should be noted that r_tIs the only feedback that can be observed from the environment after the motion sensor performs the action, and explicit information about the energy source, data sink is not available.

Probability of state transition: the state transition probabilities characterize the system dynamics over the time slot. Since the state space and the motion space are continuous, and the state transition probability space is also continuous and infinite, it is difficult to obtain the state transition probability space clearly, so that the traditional offline optimization technology cannot derive the optimal motion and power allocation strategy.

In light of the foregoing, the present embodiment extends the energy harvesting-based wireless sensor network into a two-dimensional rectangular area, and the mobile sensor extracts energy from two environmental energy sources while transmitting the collected data to the data receiver via a wireless channel. In order to obtain better energy harvesting and data transmission performance, optimization of mobility strategies is required.

The mobile sensor can obtain energy from the surrounding environment, such as solar energy, wind energy and the like. The mobile sensor obtains energy from the

energy sources

1 and 2 to maintain self motion and data transmission. In this embodiment, first, the whole time is defined as T time slots with discrete equal length, where each time slot T is divided into three sub-time slots, i.e., a data transmission model, an energy collection model, and a mobility model, as shown in fig. 4; starting from the first sub-time slot of the current time slot, the mobile sensor uses the rest of the energy in the position to execute the data transmission function, and then starts to collect the energy from the energy source in the environment to move the position of the mobile sensor for data transmission and energy collection again.

In order to focus on the interaction between energy collection and data transmission during the data transmission in the first sub-slot, the present embodiment assumes that the mobile sensor always has enough data to transmit to the data receiver. By slightly changing the power allocation strategy, it is possible to extend to the case of limited data arrival. According to the time schedule in fig. 3, the mobile sensor transmits the collected data to the data receiver in the first sub-time slot of each time slot t. Specifically, considering that a wireless communication channel between a mobile sensor and a data receiver is a gaussian channel of unit noise power, transmission data of a data transmission model satisfies the following formula:

And the second sub-time slot collects energy, and the mobile sensor simultaneously acquires energy from the energy source 1 and the energy source 2 according to the following relation, so that the energy acquired in the energy collection model is represented as follows:

wherein the content of the first and second substances,

represents the coordinate position of energy source E1;

represents the coordinate position of energy source E2; β is to prevent the distance between the two from being 0 so that the energy harvesting is infinite, i.e. β is used to limit the upper bound of the energy harvesting of the mobile sensor when the mobile sensor is close to the energy source;

The third sub-time slot mainly works to make a movement decision by the movement sensor, namely, after the tasks of the first sub-time slot and the second sub-time slot are completed, a next better position is searched for the next time slot work, and the decision comprises two parts, namely, the movement direction and the movement distance. In the present embodiment, two elements of the moving direction and the moving distance are defined as a two-dimensional continuous motion space, the direction is updated on the basis of the angle in the original state, the speed is 0 at the minimum and ρ at the maximum_maxPerforming action a_t = {

,ρ_t}。

The mobile consumption model is represented as:

wherein, C_tRepresents the movement consumption of the movement sensor, i.e. the total movement consumption per unit speed; rho_tRepresenting the speed at which the motion sensor is moving;

an energy consumption value representing a unit speed;

motion consumption is the amount of energy consumed by the motion sensor at one speed V at a time; the speed of the mobile sensor in the model is not constant and is a continuous numerical value belonging to a fixed range, and the speed of each moment is determined by the trained model to determine how fast the mobile sensor should advance at the current moment.

This embodiment assumes that the motion sensor is free to move in each time slot (a time slot means, for example, a time slot in the middle of time t to time t + 1), which itself has an original direction Φ before the movement starts_tAnd then according to what is given in the action

Combined with the original angle to obtain (phi)_t+

) For its own direction in the coordinate system, the good direction is determined and then at the speed ρ given in the motion_tAnd moving to obtain the position coordinate at the time of t + 1.

Limiting the freedom of a motion sensor to a limited two-dimensional motion set action [ phi, rho [ ]]Wherein the first dimension represents the moving direction of the sensor, i.e. the angle of the moving sensor in the current state is

At a maximum rotational speed of

The second dimension ρ is the radial distance ρ e [ ρ ] traveled by the mobile sensor in each step_min,ρ_max]Where ρ is_maxThe maximum distance that the mobile sensor can span during the time slot (t) is 1, i.e. the maximum speed of the mobile sensor. Assume the initial position of the motion sensor is x_t,y_tThe initial angle is phi_tAfter a time slot t, the energy car performs action = [ phi ]_t，ρ_t]Wherein, in the step (A),

，ρ∈[0,1]after a time slot, the position at time t +1 (i.e., the position update model) is:

wherein x is_t+1、y_t+1Representing the coordinates of the mobile sensor in a two-dimensional coordinate system at the next moment t +1 by taking t as a reference; x is the number of_t、y_tPosition coordinates representing the mobile sensor at time t; phi_tRepresents the direction (i.e., the angle in two-dimensional coordinates) at time t;

the two motion values predicted by the model at time t are shown to represent the speed and steering angle, respectively.

After the whole time slot t is finished, deducing that the energy transfer of the battery on one time slot t meets the following formula according to the data transmission model, the energy collection model and the mobility model:

The present embodiment does not allow the remaining capacity of the battery of the mobile sensor to exceed the maximum capacity of the battery throughout the movement. Wherein

Indicating that the data transmission needs to consume p before the mobile sensor at time t_tThen receiving energy from a two-terminal energy source

After the consumption and the supplement of the electric quantity, the electric quantity is selected to move to the next position, and the consumption is movedElectric quantity of C_t。

The goal of this work is to maximize the long-term average data throughput from the mobile sensor to the data receiver. As can be seen from the data transmission model, the amount of data transmitted during each time slot is determined by the transmission power and the distance from the mobile sensor to the data receiver. The transmission power available in each time slot is limited by the remaining battery level, which is affected by the energy harvesting process and the distance between the mobile sensor and the two energy sources. Therefore, mobile sensors need to be constantly repositioned in the area to seek better energy collection and data transmission locations. For ease of derivation, the T slot models over time, expressed as:

and (3) constraint:

wherein l_t、l_t+1Respectively representing the positions of the mobile sensor at the time t and the time t + 1; p is a radical of_t+1Represents the energy of the mobile sensor for data transmission at time t + 1;

represents the energy consumption per unit speed;

denoted t slots from l_tPosition is moved to_t+1Battery power consumed by the location, first publicThe equation represents the final optimization objective, i.e., maximizing the long-term average data throughput. The second inequality represents the physical condition constraints that need to be met to maximize the target, i.e., the speed of movement of the motion sensor is subject to two constraints: battery power at the beginning of the time slot and its own maximum moving speed. It is also known from the previous t-slot model that the energy used for mobile and data transmission cannot be greater than the total collected energy value for a slot, so a third inequality needs to be satisfied.

In the present embodiment, it is preferred that,

the distance moved by the sensor is moved for t time slots, which may also be called the moving speed per unit time. The maximum speed of movement of the sensor in a time slot

Besides the physical performance speed limit of the sensor, the energy requirement limit of the unit speed is also required, namely, the speed in action is given by a neural network model

It is not feasible that the remaining battery power of the sensor is insufficient to allow him to proceed at this speed, so that the distance traveled by the mobile sensor from time t to time t +1 should be minimized between the speed and the speed limited by the remaining battery power.

It is specified that the sum of the energy used by the mobile sensor for moving itself and the energy used for transmitting data remaining in the whole time slot cannot exceed the total remaining energy of its own battery at the present time. The whole process is that the mobile sensor firstly uses the self battery energy to send data at the time of t, then receives the energy emitted by the energy sources at the two ends at the current position, then uses the received energy for self movement, and reaches the time of t +1 through a time slot to reach a new position l_t+1Continuing to use the energy B of the remaining battery at the time t +1 at the new position_t+1The data transmission is resumed and the data transmission is resumed at the new positionEnergy from the two-terminal energy source is received, and the data at the data receiving point is accumulated while continuing the cycle of one move with the newly received energy until the end of one round.

In step S3, the constructed model is solved by using a depth deterministic strategy gradient algorithm to obtain a power distribution strategy corresponding to the optimal combined trajectory movement and energy.

The depth-deterministic strategy gradient algorithm is to solve the above-mentioned motion strategy problem, because the mobile sensor does not know any information of the energy source including position and energy emission situation at the beginning, and does not know which position to move itself to facilitate its energy collection and collect more data for data transmission. The action strategy is a fitting function formed by a neural network, can be used as a black box, inputs the current state of the mobile sensor, namely the position coordinate, the current direction and the current battery power of the mobile sensor, and then judges how to move to reach a new position, thereby being more beneficial to moving the sensor and transmitting data. However, the black box does not know how to assign actions (actions include moving speed and direction) to the mobile sensor at the beginning, the algorithm scheme of the embodiment utilizes two neural networks, one is used for giving an action scheme, namely the problem that how much the speed and direction of the mobile sensor are changed at the current moment, the other network is used for evaluating whether the action given by the first network is good or bad, the two networks are respectively updated to continuously optimize parameters of the two networks, the network evaluation mode for judging the actions is more and more standard, and the action strategy is better and better.

The specific process comprises the following steps: firstly, two neural networks are created, namely an operator network and a critic network, and then two target _ p networks and target _ Q networks which are respectively identical to the operator network and the critic network are created for time sequence differential training. The whole training process is carried out by M rounds, each round has T steps, when the round starts, hyper-parameters such as mean value and variance of noise are initialized firstly, the initial state s1 of the current motion sensor is obtained, and s1 is taken as an outputInputting the characteristics to an operator network, activating and outputting action by the operator network through a tank function, searching noise epsilon _ OU _ noise according to the current action design, adding the obtained noise to the operator, and normalizing the obtained action, namely mapping the normalized action to a physical interval in an actual environment, wherein the angle value belongs to [ -1, 1] in the physical interval]The speed is [0,1]]The interval, the action obtained by normalization processing is put into the environment to be executed, and then the state of the next time t +1 and the reward R (wherein R refers to the data transmitted by the mobile sensor device at the time t)

. Will obtain the quadruplet s_t,a_t,r_t,s_t+1]（s_t、a_t 、r_t s_t+1Respectively representing the state of the mobile sensor at time t, the action taken, the reward obtained and the new position at time t + 1) is stored in a buffer, the number of quadruples in the buffer is judged, when the number of quadruples exceeds the minimum training data N, the data are input into the critic and critic _ target networks to be subjected to time sequence updating, and the actor network is updated by a gradient rise strategy. The two neural networks after M rounds of training can be used for well predicting and evaluating actions after training, and the parameters can be used for guiding the actions after being fixed. After the test, a mobile sensor is randomly placed in a two-dimensional space, and after the time T, the mobile sensor can finally move to the optimal data acquisition position derived by theory, and the moving track is optimal for a data transmission target.

The reinforcement learning algorithm is a Deep Deterministic strategy Gradient, the DDPG algorithm is fully called Deep Deterministic Policy Gradient, namely Deep learning and a Deterministic strategy Gradient, the Deep learning refers to the construction of an advantage function by using a neural network, the Deterministic strategy can be divided into two parts, namely the Deterministic strategy and the strategy Gradient, the Deterministic strategy refers to the environment that the constructed advantage function can directly output a determined action and can be used for continuous actions, and the strategy Gradient refers to a strategy network updated by a single step. DDPG has the capability of handling infinite actions using a target network and an empirical playback mechanismThe capacity of the space. The DDPG algorithm is based on an operator-critic-based algorithm and is essentially a method for combining a strategy gradient and a value function, wherein the strategy function is regarded as an actor and combined with the action given by the current environment to output an action from a continuous action space

The Q network is equivalent to an evaluator, and the action a output by the Q network to the policy network is not known to be good or bad at the beginning, so that the evaluation of the output action of the operator network is learned by a time sequence score (TD) method, and the correct evaluation of the state action pair is made by continuously learning a slow and slow student.

Function of state action values Q(s) in Q network_t,a_t) Indicating the current state s_tStarting, performing action a_tUntil the end of the round to get the accumulated expected value, the state value function is as follows:

the reward value r of the current step and the Q (s ', a') of the next step are used for fitting the future profit as Q _ target, then the output of the Q network is close to the target value, and therefore the constructed loss is used for directly solving the mean square error of the two Qs. For Q network update, two phi-net networks with neural network parameters delaying update are set(s)_t,a_t) And(s)_t+1,a_t+1) Respectively inputting the two networks to obtain corresponding action state values Q(s)_t,a_t),Q(s_t+1,a_t+1) Then combined with the prize value r_tThe constructive loss function is as follows:

action a_t，a_t+1A policy network is needed to directly output a deterministic action to be evaluated by the Q network, so the loss function is characterized by：

Therein, Ψ(s)_t,θ_Ψ) The critic network phi-net is trained to output a better action at each time slot, and on the basis, the critic network phi-net can output a higher value of a corresponding Q function, the better action means that the action a output by the actor network psi-net in one state s can make Q (s, a) larger and larger, so that in order to increase the value of Q (s, a), the psi-net is updated by using a gradient ascending method, an average Q function value of all state action pairs is firstly obtained, and then the psi-net is updated by gradient ascending, and the psi-net is specifically represented as follows:

in addition to the updates of the two main networks, in order to enable the mobile sensor agent to mine more potentially good actions, the embodiment also introduces an exploration strategy, which is explored by using the Ornstein-Uhlenbeck noise in combination with the ϵ -greedy strategy, and a specific algorithm is shown in fig. 5.

Compared with the prior art, the beneficial effect of this embodiment is:

Example two

The difference between the mobile sensor intelligent track design method based on sustainable data acquisition provided by the embodiment and the embodiment I is that:

the embodiment mainly verifies the effectiveness of the proposed training algorithm.

A10X 10 two-dimensional rectangular coordinate is set, where (x, y) represents the position (x, y ∈ [0,10]) on the two-dimensional plane. Energy source 1 and energy source 2 are located at (0,10) and (0,0), respectively. The MS can be repositioned within the area to find better energy collection and data transmission locations, with the travel speed and angle limited to [0,1], [ - π/2, π/2], where μ, σ represent the mean and variance of the random distribution of energy sources, and δ represents the travel cost of the mobile sensor (i.e., energy δ is consumed for each distance traveled). Next, the validity of the proposed training algorithm (i.e., algorithm 1) is verified by detecting the motion trajectory learned by the motion sensor and the convergence of the algorithm. In addition, the effect of different network parameters (such as initial position of the mobile sensor, cost of movement per unit distance and position of the data receiver) on the performance of algorithm 1 was also studied. Simulations were performed based on the following scenarios:

scene 1: the data receiver is located at (10,10), the energy source 1 and the energy source 2 have mean values of 80 and 30 respectively, the mean values are both 1, the motion consumption δ is set to 0.1, the initial positions of the motion sensor are set to (5,5) and (5,0) respectively, and the motion sensor finally stays at the data receiver (10,10) as shown by the traces of line segment 1 and line segment 2 in fig. 6.

Scene 2: the data receiver is located at (5,10), the energy source 1 and the energy source 2 have mean values of 80 and 30, respectively, the mean values are both 1, the motion consumption δ is set to 0.1, the initial positions of the motion sensors are set to (10,5) and (10,0), respectively, and the motion sensors stay at the data receiver (5,10) finally as shown by the traces of line segment 3 and line segment 4 in fig. 6.

Scene 3: the data receiver is located at (10,10), the averages of the energy source 1 and the energy source 2 are respectively 80 and 30, the averages are both 1, the motion consumption δ is set to 1, the initial positions of the motion sensor are respectively set to (5,5) and (1,5), and the motion sensor finally stays at the sub-optimal energy source E1(0,10) as shown by the traces of line 5 and line 6 in fig. 6.

In this embodiment, the problem of maximizing data throughput of wireless sensor networks based on energy harvesting is studied. The reinforcement learning method is used for solving the challenge of unknown dynamics of energy supply in each time slot, so that the mobile sensor learns and optimizes the motion track only by tracking the amount of the aggregated received energy in the current time slot. The DDPG algorithm is further utilized to process a continuous, deterministic motion space. The results show that regardless of the initial position of the mobile sensor and the receiver position, the method is able to identify an optimal motion trajectory based on the goal of maximizing data throughput from the mobile sensor to the data collector on a long-term average level. The results also indicate that high unit distance movement costs may prevent the movement sensor from identifying an optimal movement trajectory, sometimes possibly resulting in a sub-optimal solution.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A mobile sensor intelligent track design method based on sustainable data acquisition is characterized by comprising the following steps:

2. The method for designing the intelligent trajectory of the mobile sensor based on sustainable data collection according to claim 1, wherein the step S1 is that the energy collected by the mobile sensor from the surrounding environment is specifically: the time is divided into equally spaced time slots and the mobile sensors collect the energy of the mobile sensors in each time slot.

3. The method for designing the intelligent track of the mobile sensor based on the sustainable data collection, according to claim 2, wherein the model constructed in the step S2 is T time slot models; each time slot model of the T time slot models comprises a data transmission model, an energy collection model and a mobility model.

4. A sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 3, wherein the data transmission model is expressed as:

5. The sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 4, wherein the energy collection model is represented as:

wherein the content of the first and second substances,

represents the coordinate position of energy source E1;

is shown intime t moves the total amount of energy that the sensor can harvest from the two-terminal energy source at position l.

6. The sustainable data collection-based intelligent trajectory design method for mobile sensors according to claim 5, wherein the mobility model comprises a mobile consumption model and a location update model;

the mobile consumption model is represented as:

an energy consumption value representing a unit speed;

the location update model is represented as:

indicating the direction of movement of the movement sensor.

7. The sustainable data collection-based intelligent trajectory design method for mobile sensors according to claim 6, wherein each time slot model is represented as:

8. The sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 7, wherein the T time slot models are expressed as:

and (3) constraint: