CN113365222A - Mobile sensor intelligent track design method based on sustainable data acquisition - Google Patents

Mobile sensor intelligent track design method based on sustainable data acquisition Download PDF

Info

Publication number
CN113365222A
CN113365222A CN202110916516.0A CN202110916516A CN113365222A CN 113365222 A CN113365222 A CN 113365222A CN 202110916516 A CN202110916516 A CN 202110916516A CN 113365222 A CN113365222 A CN 113365222A
Authority
CN
China
Prior art keywords
energy
sensor
model
mobile
mobile sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110916516.0A
Other languages
Chinese (zh)
Other versions
CN113365222B (en
Inventor
贾日恒
张秀铃
林飞龙
郑忠龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN202110916516.0A priority Critical patent/CN113365222B/en
Publication of CN113365222A publication Critical patent/CN113365222A/en
Application granted granted Critical
Publication of CN113365222B publication Critical patent/CN113365222B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Arrangements For Transmission Of Measured Signals (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a mobile sensor intelligent track design method based on sustainable data acquisition, which comprises the following steps: s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission; s2, constructing a model between energy collection and data transmission based on Markov decision; and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy. Aiming at the aim of maximizing the long-term average data collection throughput, the algorithm is designed so that the sensor can intelligently track the optimal arrival point of each time slot in the whole two-dimensional space area under the condition of unknown environmental energy, and then the collection of the maximized energy and the data transmission are carried out in the time slot.

Description

Mobile sensor intelligent track design method based on sustainable data acquisition
Technical Field
The invention relates to the technical field of wireless information transmission, in particular to a mobile sensor intelligent track design method based on sustainable data acquisition.
Background
A Wireless Sensor Network (WSN) is a distributed sensing network whose distal end is a Sensor that can sense and inspect the outside world. The sensors in the WSN communicate in a wireless mode, so that the network setting is flexible, the position of equipment can be changed at any time, and the equipment can be connected with the Internet in a wired or wireless mode. Each node in the network has induction and is often used in scenes such as ecological environment monitoring, intelligent security patrol, forest temperature and humidity data acquisition and the like, but the data acquired by each node of the sensor network usually faces a lot of challenges in the data transmission process, because the capacity of the sensor node is limited and the area where the whole network is deployed is generally complex and not beneficial to the data transmission of the node. Therefore, the mobile sensor intelligent track design method based on sustainable data acquisition mainly puts a mobile data acquisition device into an area where a wireless sensor network is deployed to collect data acquired by scattered node sensors in the sensor network. Considering that the mobile sensor consumes energy for moving and data transmission, and frequent battery replacement in complex applications is not practical, the energy collection (EH) technology enables Wireless Sensor Networks (WSNs) to develop continuously by itself to maintain long-term key performance indexes such as data throughput and transmission coverage capability. Therefore, the wireless charging technology is introduced to obtain energy from the surrounding environment to supply energy to the mobile sensor, the mobile sensor directly obtains energy from the surrounding environment to realize self-supply, and the wireless sensor network can operate permanently, so that a replaceable battery or a fixed power grid does not need to be equipped, and the utilization rate of the wireless sensor equipment deployed on a large scale is greatly improved. In practical applications, the environmental energy obtained by the wireless sensor is usually unknown, and the corresponding energy collection process has randomness and dynamic characteristics, and these uncertainties can affect long-term key performances such as data throughput, sensing coverage and data transmission of the sensor network. There is therefore a need for efficient learning algorithms that enable mobile sensors to adapt to the goals of sustainable data collection.
Aiming at the problems, the invention provides a mobile sensor intelligent track design method based on sustainable data acquisition, which solves the technical problems.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a mobile sensor intelligent track design method based on sustainable data acquisition, aiming at the aim of maximizing the long-term average data collection throughput, a design algorithm enables a sensor to intelligently track an optimal arrival point under each time slot in the whole two-dimensional space area under the condition of unknown environmental energy, and then the collection of the maximized energy and the data transmission are carried out in the time slot.
In order to achieve the purpose, the invention adopts the following technical scheme:
a mobile sensor intelligent track design method based on sustainable data acquisition comprises the following steps:
s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission;
s2, constructing a model between energy collection and data transmission based on Markov decision;
and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy.
Further, the step S1 of collecting energy from the surrounding environment by the mobile sensor specifically includes: the time is divided into equally spaced time slots and the mobile sensors collect the energy of the mobile sensors in each time slot.
Further, the models constructed in step S2 are T slot models; each time slot model of the T time slot models comprises a data transmission model, an energy collection model and a mobility model.
Further, the data transmission model is represented as:
Figure 592735DEST_PATH_IMAGE001
wherein, JtRepresenting a data transmission model; p is a radical oftRepresenting transmission data; ltRepresents a motion sensor; lsRepresenting a data receiver; lt-lsL represents the distance of the moving sensor to the data receiver; α represents a distance loss factor.
Further, the energy harvesting model is represented as:
Figure 718954DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 559871DEST_PATH_IMAGE003
represents the amount of energy harvested from energy source E1 by the mobile sensor at time t;
Figure 286387DEST_PATH_IMAGE004
represents the amount of energy harvested from energy source E2 by the mobile sensor at time t;
Figure 605373DEST_PATH_IMAGE005
represents the coordinate position of energy source E1;
Figure 976312DEST_PATH_IMAGE006
represents the coordinate position of energy source E2; beta represents an upper bound that limits the mobile sensor to harvest energy;
Figure 863496DEST_PATH_IMAGE007
representing the total energy value that can be harvested from the two-terminal energy source at time t by moving the sensor at position i.
Further, the mobility model comprises a mobility consumption model and a location update model;
the mobile consumption model is represented as:
Figure 421516DEST_PATH_IMAGE008
wherein, CtRepresents the movement consumption of the movement sensor; rhotRepresenting the speed at which the motion sensor is moving;
Figure 435871DEST_PATH_IMAGE009
an energy consumption value representing a unit speed;
the location update model is represented as:
Figure 661316DEST_PATH_IMAGE010
wherein x ist+1、yt+1Representing the coordinates of the mobile sensor in a two-dimensional coordinate system at the next moment t +1 by taking t as a reference; x is the number oft、ytPosition coordinates representing the mobile sensor at time t; phitIndicates the direction at time t;
Figure 578457DEST_PATH_IMAGE011
indicating the direction of movement of the movement sensor.
Further, each of the timeslot models is represented as:
Figure 30298DEST_PATH_IMAGE012
wherein, Bt+1Representing each slot model; b ismaxRepresents the maximum battery capacity of the mobile sensor; b istIndicating the remaining charge of the motion sensor.
Further, the T slot models are represented as:
Figure 956665DEST_PATH_IMAGE013
and (3) constraint:
Figure 161251DEST_PATH_IMAGE014
Figure 514871DEST_PATH_IMAGE015
Figure 188429DEST_PATH_IMAGE016
wherein lt、lt+1 Respectively representing the positions of the mobile sensor at the time t and the time t + 1; p is a radical oft+1Representing the energy used by the motion sensor for data transmission at time t + 1.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method has good generalization performance, and can self-locate to a better collection point for energy collection and data transmission even in an unknown area by using a trained parameter model;
(2) two layers of fully-connected deep neural network approximation value functions are used, and an operator-critic algorithm is adopted to solve the strategy optimization problem of the continuous action space;
(3) exploration without knowing the energy distribution and any information on the energy: the mobile sensor is intelligently searched for an optimal track to reach a theoretical optimal position for energy collection and data transmission at different initial positions of the mobile sensor and different positions of a data receiver (sink), and the high moving cost of a unit distance possibly prevents the mobile sensor from identifying the optimal moving track and possibly leads to suboptimal solution;
(4) the model is continuously moved to adapt to the real physical environment with momentum, where the speed of movement and the angle of rotation can take any value within a certain range. Accordingly, a time-dependent exploration strategy is provided to adapt to the physical control process of the mobile sensor with inertia, so that the exploration efficiency in the training process is improved;
(5) the method has important technical significance for realizing the self-continuity of the wireless sensor network based on energy harvesting and solving the problem that environmental energy has randomness and unpredictability, and has important values for large-scale deployment and use of the wireless sensor network, maximum utilization of environmental resources and reduction of deployment cost.
Drawings
FIG. 1 is a flowchart of a mobile sensor intelligent trajectory design method based on sustainable data collection according to an embodiment;
FIG. 2 is a schematic diagram of a mobile model of a mobile sensor for collecting energy and transmitting data according to an embodiment;
FIG. 3 is a system interaction model diagram of an actor-critic-based reinforcement learning framework provided by an embodiment;
FIG. 4 is a diagram of dividing each time slot t into three sub-time slots according to the first embodiment;
FIG. 5 is a schematic diagram of an algorithm provided in accordance with one embodiment;
fig. 6 is a schematic diagram of a track of the motion sensor provided in the second embodiment.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The invention aims to provide a mobile sensor intelligent track design method based on sustainable data acquisition, aiming at the defects of the prior art.
Example one
The embodiment provides a mobile sensor intelligent track design method based on sustainable data acquisition, as shown in fig. 1, including the steps of:
s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission;
s2, constructing a model between energy collection and data transmission based on Markov decision;
and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy.
In step S1, the motion sensor collects energy from the surrounding environment and uses the collected energy for self-motion and data transmission.
The mobile sensor in the embodiment is used for collecting energy around the environment, using the collected energy for self movement, collecting peripheral data such as temperature information obtained by the temperature sensor in the movement process, and transmitting the collected data to the data receiver; wherein the energy of the mobile sensor is also consumed during the transmission, and therefore the energy collected by the mobile sensor is also used for data transmission.
The present embodiment considers the throughput maximization problem of energy harvesting based wireless sensor networks with only random system information. In particular, consider a mobile sensor whose primary function is to move through the network to collect data and transmit the collected data to a data receiver (sink). The motion sensor may be self-supplementing by harvesting energy from the surrounding environment. As shown in fig. 2, assuming that the surrounding environment has two energy sources, i.e., an energy source 1 and an energy source 2 (the energy source of the present embodiment is not limited to two, and there may be a plurality of energy sources), the mobile sensor may obtain energy from the energy sources 1 and 2. The energy obtained by the mobile sensor is specifically as follows: the time is divided into equally spaced time slots, and the energy in each time slot is acquired.
In this embodiment, the amount of energy captured by the motion sensor is determined by the following factors:
energy sources: it is assumed that the energy sources 1, 2 are independent energy sources and that a certain amount of energy is randomly generated at each time slot. Thus, even if the mobile sensor remains in the same location for a period of time, the energy received may be different, as the energy is typically affected by environmental changes.
Position of the mobile sensor: as the mobile sensor moves within the wireless sensor network, the location of the mobile sensor determines how much energy is available from energy sources 1, 2, for example: the closer the mobile sensor is to the energy source 1, the more energy the mobile sensor can capture from the energy source 1, since the power at which the mobile sensor captures energy is generally inversely proportional to the distance between the energy transmitter and the data receiver.
Trade-off between energy harvesting and data transmission: for a mobile sensor, the closer to the energy source, the more energy is acquired, and the more energy is used for data transmission; however, this may result in a reduction in data transmission power as the distance from the mobile sensor to the data receiver may increase as it approaches the energy source.
In step S2, a model between energy collection and data transmission is constructed based on the markov decision.
In response to the above problem, the present embodiment first models the collected energy and transmitted data jointly using a markov decision process, since at the beginning of each time slot the motion sensor determines future power allocation and motion based on the current situation. Then, based on the markov decision formula, a depth-deterministic policy gradient algorithm is employed to identify an optimal joint trajectory movement and energy-capturing power allocation policy (i.e., a function that maps the current situation to the best decision per slot) to maximize the long-term average data throughput from the moving sensor to the data receiver.
The present embodiment represents the motion sensor as an intelligent agent, learning and optimizing decisions by continuous interaction with the environment, defining S and a as a state space and an action space, respectively, containing all possible states and actions. Defining a state transition probability P (S)t+1|St,at) I.e. in state StLower execution action atA state transition to state S occurst+1R, the parameter y indicates that the discount factor represents the degree of importance to future rewards. In the present model, MDP M = { S, a, P, R, γ } equation is defined to simulate the interaction process of the mobile sensor, as shown in fig. 3.
State (S): defining the state of the system before the start of the time slot as st={Bt,xt,yt,
Figure 918488DEST_PATH_IMAGE017
And the power consumption of the mobile sensor is higher than the power consumption of the mobile sensor, and the power consumption of the mobile sensor is higher than the power consumption of the mobile sensor. From Markov properties, state stAll historical state information prior to t is inherited and the motion sensor can make further motion and power allocation decisions based on this information.
Operation (A): an action is defined as a motion mode decision assignment at each position. The motion allocation decision here consists of two parts, determining the direction of motion of the motion sensor by selecting the angle of rotation
Figure 587367DEST_PATH_IMAGE018
And moving speed ρt [0,ρmax]These two parameters are used to make the movement decision of the time slot t. Furthermore, the motion sensor needs to determine the energy p used to transmit data to the data receivertThe number of the cells. Thus, the action taken by the mobile sensor at each time slot is represented as: a ist={pt,
Figure 637DEST_PATH_IMAGE019
t}. At the beginning of each time slot, it is assumed that the energy remaining in the mobile sensor battery is all used for data transmission, i.e. pt = BtThus redefining the action as: a ist={
Figure 286125DEST_PATH_IMAGE019
t}。
Reward (R): since the ultimate goal of this embodiment is to maximize the data throughput from the long-term average moving sensor to the data receiver, the reward is defined as the amount of data transmitted per slot. The present embodiment provides a reward r available to the mobile sensor in each time slottDefined as the ratio of the remaining power of the mobile sensor at the present moment to the distance between the mobile sensor and the data receiver, i.e. rt= Bt/|lt-ls|α. For the main reasons such asThe following: on the one hand, the value of data throughput of each time slot and the residual battery capacity B of the mobile sensortIf more energy can be collected and stored in advance, more energy can be allocated for data transmission subsequently; on the other hand, the data throughput value per time slot is related to the distance | l of the mobile sensor to the data receivert -lsL is inversely proportional. Thus, the prize rtThe higher the value, the more the mobile sensor tends to find a better location where more energy can be harvested, while the shorter the distance between the mobile sensor and the data receiver. It should be noted that rtIs the only feedback that can be observed from the environment after the motion sensor performs the action, and explicit information about the energy source, data sink is not available.
Probability of state transition: the state transition probabilities characterize the system dynamics over the time slot. Since the state space and the motion space are continuous, and the state transition probability space is also continuous and infinite, it is difficult to obtain the state transition probability space clearly, so that the traditional offline optimization technology cannot derive the optimal motion and power allocation strategy.
In light of the foregoing, the present embodiment extends the energy harvesting-based wireless sensor network into a two-dimensional rectangular area, and the mobile sensor extracts energy from two environmental energy sources while transmitting the collected data to the data receiver via a wireless channel. In order to obtain better energy harvesting and data transmission performance, optimization of mobility strategies is required.
The mobile sensor can obtain energy from the surrounding environment, such as solar energy, wind energy and the like. The mobile sensor obtains energy from the energy sources 1 and 2 to maintain self motion and data transmission. In this embodiment, first, the whole time is defined as T time slots with discrete equal length, where each time slot T is divided into three sub-time slots, i.e., a data transmission model, an energy collection model, and a mobility model, as shown in fig. 4; starting from the first sub-time slot of the current time slot, the mobile sensor uses the rest of the energy in the position to execute the data transmission function, and then starts to collect the energy from the energy source in the environment to move the position of the mobile sensor for data transmission and energy collection again.
In order to focus on the interaction between energy collection and data transmission during the data transmission in the first sub-slot, the present embodiment assumes that the mobile sensor always has enough data to transmit to the data receiver. By slightly changing the power allocation strategy, it is possible to extend to the case of limited data arrival. According to the time schedule in fig. 3, the mobile sensor transmits the collected data to the data receiver in the first sub-time slot of each time slot t. Specifically, considering that a wireless communication channel between a mobile sensor and a data receiver is a gaussian channel of unit noise power, transmission data of a data transmission model satisfies the following formula:
Figure 164082DEST_PATH_IMAGE020
wherein, JtRepresenting a data transmission model; p is a radical oftRepresenting transmission data; ltRepresents a motion sensor; lsRepresenting a data receiver; lt-lsL represents the distance of the moving sensor to the data receiver; α represents a distance loss factor.
And the second sub-time slot collects energy, and the mobile sensor simultaneously acquires energy from the energy source 1 and the energy source 2 according to the following relation, so that the energy acquired in the energy collection model is represented as follows:
Figure 218626DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 914050DEST_PATH_IMAGE003
represents the amount of energy harvested from energy source E1 by the mobile sensor at time t;
Figure 545888DEST_PATH_IMAGE004
represents the amount of energy harvested from energy source E2 by the mobile sensor at time t;
Figure 352170DEST_PATH_IMAGE005
represents the coordinate position of energy source E1;
Figure 402166DEST_PATH_IMAGE006
represents the coordinate position of energy source E2; β is to prevent the distance between the two from being 0 so that the energy harvesting is infinite, i.e. β is used to limit the upper bound of the energy harvesting of the mobile sensor when the mobile sensor is close to the energy source;
Figure 268490DEST_PATH_IMAGE007
representing the total energy value that can be harvested from the two-terminal energy source at time t by moving the sensor at position i.
The third sub-time slot mainly works to make a movement decision by the movement sensor, namely, after the tasks of the first sub-time slot and the second sub-time slot are completed, a next better position is searched for the next time slot work, and the decision comprises two parts, namely, the movement direction and the movement distance. In the present embodiment, two elements of the moving direction and the moving distance are defined as a two-dimensional continuous motion space, the direction is updated on the basis of the angle in the original state, the speed is 0 at the minimum and ρ at the maximummaxPerforming action at = {
Figure 154669DEST_PATH_IMAGE019
t}。
The mobile consumption model is represented as:
Figure 764642DEST_PATH_IMAGE021
wherein, CtRepresents the movement consumption of the movement sensor, i.e. the total movement consumption per unit speed; rhotRepresenting the speed at which the motion sensor is moving;
Figure 528199DEST_PATH_IMAGE022
an energy consumption value representing a unit speed;
motion consumption is the amount of energy consumed by the motion sensor at one speed V at a time; the speed of the mobile sensor in the model is not constant and is a continuous numerical value belonging to a fixed range, and the speed of each moment is determined by the trained model to determine how fast the mobile sensor should advance at the current moment.
This embodiment assumes that the motion sensor is free to move in each time slot (a time slot means, for example, a time slot in the middle of time t to time t + 1), which itself has an original direction Φ before the movement startst And then according to what is given in the action
Figure 440791DEST_PATH_IMAGE019
Combined with the original angle to obtain (phi)t+
Figure 188167DEST_PATH_IMAGE019
) For its own direction in the coordinate system, the good direction is determined and then at the speed ρ given in the motiontAnd moving to obtain the position coordinate at the time of t + 1.
Limiting the freedom of a motion sensor to a limited two-dimensional motion set action [ phi, rho [ ]]Wherein the first dimension represents the moving direction of the sensor, i.e. the angle of the moving sensor in the current state is
Figure 460886DEST_PATH_IMAGE019
At a maximum rotational speed of
Figure 344528DEST_PATH_IMAGE023
The second dimension ρ is the radial distance ρ e [ ρ ] traveled by the mobile sensor in each stepminmax]Where ρ ismaxThe maximum distance that the mobile sensor can span during the time slot (t) is 1, i.e. the maximum speed of the mobile sensor. Assume the initial position of the motion sensor is xt,ytThe initial angle is phitAfter a time slot t, the energy car performs action = [ phi ]t,ρt]Wherein, in the step (A),
Figure 693601DEST_PATH_IMAGE024
,ρ∈[0,1]after a time slot, the position at time t +1 (i.e., the position update model) is:
Figure 397115DEST_PATH_IMAGE010
wherein x ist+1、yt+1Representing the coordinates of the mobile sensor in a two-dimensional coordinate system at the next moment t +1 by taking t as a reference; x is the number oft、ytPosition coordinates representing the mobile sensor at time t; phitRepresents the direction (i.e., the angle in two-dimensional coordinates) at time t;
Figure 348890DEST_PATH_IMAGE025
the two motion values predicted by the model at time t are shown to represent the speed and steering angle, respectively.
After the whole time slot t is finished, deducing that the energy transfer of the battery on one time slot t meets the following formula according to the data transmission model, the energy collection model and the mobility model:
Figure 733645DEST_PATH_IMAGE026
wherein, Bt+1Representing each slot model; b ismaxRepresents the maximum battery capacity of the mobile sensor; b istIndicating the remaining charge of the motion sensor.
The present embodiment does not allow the remaining capacity of the battery of the mobile sensor to exceed the maximum capacity of the battery throughout the movement. Wherein
Figure 112674DEST_PATH_IMAGE027
Indicating that the data transmission needs to consume p before the mobile sensor at time ttThen receiving energy from a two-terminal energy source
Figure 444429DEST_PATH_IMAGE028
After the consumption and the supplement of the electric quantity, the electric quantity is selected to move to the next position, and the consumption is movedElectric quantity of Ct
The goal of this work is to maximize the long-term average data throughput from the mobile sensor to the data receiver. As can be seen from the data transmission model, the amount of data transmitted during each time slot is determined by the transmission power and the distance from the mobile sensor to the data receiver. The transmission power available in each time slot is limited by the remaining battery level, which is affected by the energy harvesting process and the distance between the mobile sensor and the two energy sources. Therefore, mobile sensors need to be constantly repositioned in the area to seek better energy collection and data transmission locations. For ease of derivation, the T slot models over time, expressed as:
Figure 934317DEST_PATH_IMAGE029
and (3) constraint:
Figure 792551DEST_PATH_IMAGE030
Figure 732694DEST_PATH_IMAGE031
Figure 410800DEST_PATH_IMAGE032
wherein lt、lt+1 Respectively representing the positions of the mobile sensor at the time t and the time t + 1; p is a radical oft+1Represents the energy of the mobile sensor for data transmission at time t + 1;
Figure 579745DEST_PATH_IMAGE033
represents the energy consumption per unit speed;
Figure 292486DEST_PATH_IMAGE034
denoted t slots from ltPosition is moved tot+1Battery power consumed by the location, first publicThe equation represents the final optimization objective, i.e., maximizing the long-term average data throughput. The second inequality represents the physical condition constraints that need to be met to maximize the target, i.e., the speed of movement of the motion sensor is subject to two constraints: battery power at the beginning of the time slot and its own maximum moving speed. It is also known from the previous t-slot model that the energy used for mobile and data transmission cannot be greater than the total collected energy value for a slot, so a third inequality needs to be satisfied.
In the present embodiment, it is preferred that,
Figure 13317DEST_PATH_IMAGE035
the distance moved by the sensor is moved for t time slots, which may also be called the moving speed per unit time. The maximum speed of movement of the sensor in a time slot
Figure 70397DEST_PATH_IMAGE036
Besides the physical performance speed limit of the sensor, the energy requirement limit of the unit speed is also required, namely, the speed in action is given by a neural network model
Figure 902087DEST_PATH_IMAGE036
It is not feasible that the remaining battery power of the sensor is insufficient to allow him to proceed at this speed, so that the distance traveled by the mobile sensor from time t to time t +1 should be minimized between the speed and the speed limited by the remaining battery power.
It is specified that the sum of the energy used by the mobile sensor for moving itself and the energy used for transmitting data remaining in the whole time slot cannot exceed the total remaining energy of its own battery at the present time. The whole process is that the mobile sensor firstly uses the self battery energy to send data at the time of t, then receives the energy emitted by the energy sources at the two ends at the current position, then uses the received energy for self movement, and reaches the time of t +1 through a time slot to reach a new position lt+1Continuing to use the energy B of the remaining battery at the time t +1 at the new positiont+1The data transmission is resumed and the data transmission is resumed at the new positionEnergy from the two-terminal energy source is received, and the data at the data receiving point is accumulated while continuing the cycle of one move with the newly received energy until the end of one round.
In step S3, the constructed model is solved by using a depth deterministic strategy gradient algorithm to obtain a power distribution strategy corresponding to the optimal combined trajectory movement and energy.
The depth-deterministic strategy gradient algorithm is to solve the above-mentioned motion strategy problem, because the mobile sensor does not know any information of the energy source including position and energy emission situation at the beginning, and does not know which position to move itself to facilitate its energy collection and collect more data for data transmission. The action strategy is a fitting function formed by a neural network, can be used as a black box, inputs the current state of the mobile sensor, namely the position coordinate, the current direction and the current battery power of the mobile sensor, and then judges how to move to reach a new position, thereby being more beneficial to moving the sensor and transmitting data. However, the black box does not know how to assign actions (actions include moving speed and direction) to the mobile sensor at the beginning, the algorithm scheme of the embodiment utilizes two neural networks, one is used for giving an action scheme, namely the problem that how much the speed and direction of the mobile sensor are changed at the current moment, the other network is used for evaluating whether the action given by the first network is good or bad, the two networks are respectively updated to continuously optimize parameters of the two networks, the network evaluation mode for judging the actions is more and more standard, and the action strategy is better and better.
The specific process comprises the following steps: firstly, two neural networks are created, namely an operator network and a critic network, and then two target _ p networks and target _ Q networks which are respectively identical to the operator network and the critic network are created for time sequence differential training. The whole training process is carried out by M rounds, each round has T steps, when the round starts, hyper-parameters such as mean value and variance of noise are initialized firstly, the initial state s1 of the current motion sensor is obtained, and s1 is taken as an outputInputting the characteristics to an operator network, activating and outputting action by the operator network through a tank function, searching noise epsilon _ OU _ noise according to the current action design, adding the obtained noise to the operator, and normalizing the obtained action, namely mapping the normalized action to a physical interval in an actual environment, wherein the angle value belongs to [ -1, 1] in the physical interval]The speed is [0,1]]The interval, the action obtained by normalization processing is put into the environment to be executed, and then the state of the next time t +1 and the reward R (wherein R refers to the data transmitted by the mobile sensor device at the time t)
Figure 344701DEST_PATH_IMAGE037
. Will obtain the quadruplet st,at,rt,st+1](st、at 、rt st+1Respectively representing the state of the mobile sensor at time t, the action taken, the reward obtained and the new position at time t + 1) is stored in a buffer, the number of quadruples in the buffer is judged, when the number of quadruples exceeds the minimum training data N, the data are input into the critic and critic _ target networks to be subjected to time sequence updating, and the actor network is updated by a gradient rise strategy. The two neural networks after M rounds of training can be used for well predicting and evaluating actions after training, and the parameters can be used for guiding the actions after being fixed. After the test, a mobile sensor is randomly placed in a two-dimensional space, and after the time T, the mobile sensor can finally move to the optimal data acquisition position derived by theory, and the moving track is optimal for a data transmission target.
The reinforcement learning algorithm is a Deep Deterministic strategy Gradient, the DDPG algorithm is fully called Deep Deterministic Policy Gradient, namely Deep learning and a Deterministic strategy Gradient, the Deep learning refers to the construction of an advantage function by using a neural network, the Deterministic strategy can be divided into two parts, namely the Deterministic strategy and the strategy Gradient, the Deterministic strategy refers to the environment that the constructed advantage function can directly output a determined action and can be used for continuous actions, and the strategy Gradient refers to a strategy network updated by a single step. DDPG has the capability of handling infinite actions using a target network and an empirical playback mechanismThe capacity of the space. The DDPG algorithm is based on an operator-critic-based algorithm and is essentially a method for combining a strategy gradient and a value function, wherein the strategy function is regarded as an actor and combined with the action given by the current environment to output an action from a continuous action space
Figure 502012DEST_PATH_IMAGE038
The Q network is equivalent to an evaluator, and the action a output by the Q network to the policy network is not known to be good or bad at the beginning, so that the evaluation of the output action of the operator network is learned by a time sequence score (TD) method, and the correct evaluation of the state action pair is made by continuously learning a slow and slow student.
Function of state action values Q(s) in Q networkt,at) Indicating the current state stStarting, performing action atUntil the end of the round to get the accumulated expected value, the state value function is as follows:
Figure 154711DEST_PATH_IMAGE040
the reward value r of the current step and the Q (s ', a') of the next step are used for fitting the future profit as Q _ target, then the output of the Q network is close to the target value, and therefore the constructed loss is used for directly solving the mean square error of the two Qs. For Q network update, two phi-net networks with neural network parameters delaying update are set(s)t,at) And(s)t+1,at+1) Respectively inputting the two networks to obtain corresponding action state values Q(s)t,at),Q(st+1,at+1) Then combined with the prize value rtThe constructive loss function is as follows:
Figure 914725DEST_PATH_IMAGE041
action at,at+1A policy network is needed to directly output a deterministic action to be evaluated by the Q network, so the loss function is characterized by:
Figure 336479DEST_PATH_IMAGE042
Figure 540058DEST_PATH_IMAGE043
Therein, Ψ(s)tΨ) The critic network phi-net is trained to output a better action at each time slot, and on the basis, the critic network phi-net can output a higher value of a corresponding Q function, the better action means that the action a output by the actor network psi-net in one state s can make Q (s, a) larger and larger, so that in order to increase the value of Q (s, a), the psi-net is updated by using a gradient ascending method, an average Q function value of all state action pairs is firstly obtained, and then the psi-net is updated by gradient ascending, and the psi-net is specifically represented as follows:
Figure 414474DEST_PATH_IMAGE044
Figure 119124DEST_PATH_IMAGE045
in addition to the updates of the two main networks, in order to enable the mobile sensor agent to mine more potentially good actions, the embodiment also introduces an exploration strategy, which is explored by using the Ornstein-Uhlenbeck noise in combination with the ϵ -greedy strategy, and a specific algorithm is shown in fig. 5.
Compared with the prior art, the beneficial effect of this embodiment is:
(1) the method has good generalization performance, and can self-locate to a better collection point for energy collection and data transmission even in an unknown area by using a trained parameter model;
(2) two layers of fully-connected deep neural network approximation value functions are used, and an operator-critic algorithm is adopted to solve the strategy optimization problem of the continuous action space;
(3) exploration without knowing the energy distribution and any information on the energy: the mobile sensor is intelligently searched for an optimal track to reach a theoretical optimal position for energy collection and data transmission at different initial positions of the mobile sensor and different positions of a data receiver (sink), and the high moving cost of a unit distance possibly prevents the mobile sensor from identifying the optimal moving track and possibly leads to suboptimal solution;
(4) the model is continuously moved to adapt to the real physical environment with momentum, where the speed of movement and the angle of rotation can take any value within a certain range. Accordingly, a time-dependent exploration strategy is provided to adapt to the physical control process of the mobile sensor with inertia, so that the exploration efficiency in the training process is improved;
(5) the method has important technical significance for realizing the self-continuity of the wireless sensor network based on energy harvesting and solving the problem that environmental energy has randomness and unpredictability, and has important values for large-scale deployment and use of the wireless sensor network, maximum utilization of environmental resources and reduction of deployment cost.
Example two
The difference between the mobile sensor intelligent track design method based on sustainable data acquisition provided by the embodiment and the embodiment I is that:
the embodiment mainly verifies the effectiveness of the proposed training algorithm.
A10X 10 two-dimensional rectangular coordinate is set, where (x, y) represents the position (x, y ∈ [0,10]) on the two-dimensional plane. Energy source 1 and energy source 2 are located at (0,10) and (0,0), respectively. The MS can be repositioned within the area to find better energy collection and data transmission locations, with the travel speed and angle limited to [0,1], [ - π/2, π/2], where μ, σ represent the mean and variance of the random distribution of energy sources, and δ represents the travel cost of the mobile sensor (i.e., energy δ is consumed for each distance traveled). Next, the validity of the proposed training algorithm (i.e., algorithm 1) is verified by detecting the motion trajectory learned by the motion sensor and the convergence of the algorithm. In addition, the effect of different network parameters (such as initial position of the mobile sensor, cost of movement per unit distance and position of the data receiver) on the performance of algorithm 1 was also studied. Simulations were performed based on the following scenarios:
scene 1: the data receiver is located at (10,10), the energy source 1 and the energy source 2 have mean values of 80 and 30 respectively, the mean values are both 1, the motion consumption δ is set to 0.1, the initial positions of the motion sensor are set to (5,5) and (5,0) respectively, and the motion sensor finally stays at the data receiver (10,10) as shown by the traces of line segment 1 and line segment 2 in fig. 6.
Scene 2: the data receiver is located at (5,10), the energy source 1 and the energy source 2 have mean values of 80 and 30, respectively, the mean values are both 1, the motion consumption δ is set to 0.1, the initial positions of the motion sensors are set to (10,5) and (10,0), respectively, and the motion sensors stay at the data receiver (5,10) finally as shown by the traces of line segment 3 and line segment 4 in fig. 6.
Scene 3: the data receiver is located at (10,10), the averages of the energy source 1 and the energy source 2 are respectively 80 and 30, the averages are both 1, the motion consumption δ is set to 1, the initial positions of the motion sensor are respectively set to (5,5) and (1,5), and the motion sensor finally stays at the sub-optimal energy source E1(0,10) as shown by the traces of line 5 and line 6 in fig. 6.
In this embodiment, the problem of maximizing data throughput of wireless sensor networks based on energy harvesting is studied. The reinforcement learning method is used for solving the challenge of unknown dynamics of energy supply in each time slot, so that the mobile sensor learns and optimizes the motion track only by tracking the amount of the aggregated received energy in the current time slot. The DDPG algorithm is further utilized to process a continuous, deterministic motion space. The results show that regardless of the initial position of the mobile sensor and the receiver position, the method is able to identify an optimal motion trajectory based on the goal of maximizing data throughput from the mobile sensor to the data collector on a long-term average level. The results also indicate that high unit distance movement costs may prevent the movement sensor from identifying an optimal movement trajectory, sometimes possibly resulting in a sub-optimal solution.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (8)

1. A mobile sensor intelligent track design method based on sustainable data acquisition is characterized by comprising the following steps:
s1, a mobile sensor collects energy from the surrounding environment and uses the collected energy for self movement and data transmission;
s2, constructing a model between energy collection and data transmission based on Markov decision;
and S3, solving the constructed model by adopting a depth certainty strategy gradient algorithm to obtain an optimal combined power distribution strategy corresponding to the track movement and the energy.
2. The method for designing the intelligent trajectory of the mobile sensor based on sustainable data collection according to claim 1, wherein the step S1 is that the energy collected by the mobile sensor from the surrounding environment is specifically: the time is divided into equally spaced time slots and the mobile sensors collect the energy of the mobile sensors in each time slot.
3. The method for designing the intelligent track of the mobile sensor based on the sustainable data collection, according to claim 2, wherein the model constructed in the step S2 is T time slot models; each time slot model of the T time slot models comprises a data transmission model, an energy collection model and a mobility model.
4. A sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 3, wherein the data transmission model is expressed as:
Figure 458766DEST_PATH_IMAGE001
wherein, JtRepresenting a data transmission model; p is a radical oftRepresenting transmission data; ltRepresents a motion sensor; lsRepresenting a data receiver; lt-lsL represents the distance of the moving sensor to the data receiver; α represents a distance loss factor.
5. The sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 4, wherein the energy collection model is represented as:
Figure 122966DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 645214DEST_PATH_IMAGE003
represents the amount of energy harvested from energy source E1 by the mobile sensor at time t;
Figure 78469DEST_PATH_IMAGE004
represents the amount of energy harvested from energy source E2 by the mobile sensor at time t;
Figure 27971DEST_PATH_IMAGE005
represents the coordinate position of energy source E1;
Figure 913887DEST_PATH_IMAGE006
represents the coordinate position of energy source E2; beta represents an upper bound that limits the mobile sensor to harvest energy;
Figure 505405DEST_PATH_IMAGE007
is shown intime t moves the total amount of energy that the sensor can harvest from the two-terminal energy source at position l.
6. The sustainable data collection-based intelligent trajectory design method for mobile sensors according to claim 5, wherein the mobility model comprises a mobile consumption model and a location update model;
the mobile consumption model is represented as:
Figure 793167DEST_PATH_IMAGE008
wherein, CtRepresents the movement consumption of the movement sensor; rhotRepresenting the speed at which the motion sensor is moving;
Figure 913570DEST_PATH_IMAGE009
an energy consumption value representing a unit speed;
the location update model is represented as:
Figure 755624DEST_PATH_IMAGE010
wherein x ist+1、yt+1Representing the coordinates of the mobile sensor in a two-dimensional coordinate system at the next moment t +1 by taking t as a reference; x is the number oft、ytPosition coordinates representing the mobile sensor at time t; phitIndicates the direction at time t;
Figure 764817DEST_PATH_IMAGE011
indicating the direction of movement of the movement sensor.
7. The sustainable data collection-based intelligent trajectory design method for mobile sensors according to claim 6, wherein each time slot model is represented as:
Figure 782451DEST_PATH_IMAGE012
wherein, Bt+1Representing each slot model; b ismaxRepresents the maximum battery capacity of the mobile sensor; b istIndicating the remaining charge of the motion sensor.
8. The sustainable data acquisition-based intelligent trajectory design method for mobile sensors according to claim 7, wherein the T time slot models are expressed as:
Figure 463968DEST_PATH_IMAGE013
and (3) constraint:
Figure 199843DEST_PATH_IMAGE014
Figure 992218DEST_PATH_IMAGE015
Figure 864360DEST_PATH_IMAGE016
wherein lt、lt+1 Respectively representing the positions of the mobile sensor at the time t and the time t + 1; p is a radical oft+1Representing the energy used by the motion sensor for data transmission at time t + 1.
CN202110916516.0A 2021-08-11 2021-08-11 Mobile sensor intelligent track design method based on sustainable data acquisition Active CN113365222B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916516.0A CN113365222B (en) 2021-08-11 2021-08-11 Mobile sensor intelligent track design method based on sustainable data acquisition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916516.0A CN113365222B (en) 2021-08-11 2021-08-11 Mobile sensor intelligent track design method based on sustainable data acquisition

Publications (2)

Publication Number Publication Date
CN113365222A true CN113365222A (en) 2021-09-07
CN113365222B CN113365222B (en) 2021-11-12

Family

ID=77522923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916516.0A Active CN113365222B (en) 2021-08-11 2021-08-11 Mobile sensor intelligent track design method based on sustainable data acquisition

Country Status (1)

Country Link
CN (1) CN113365222B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825101A (en) * 2021-11-24 2021-12-21 浙江师范大学 Charging trolley track design method based on heterogeneous wireless sensor network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106452513A (en) * 2016-10-09 2017-02-22 华侨大学 Mobile energy supplementation method in delay-constrained wireless sensor network
CN107105467A (en) * 2017-05-16 2017-08-29 河海大学常州校区 A kind of High Availabitity wireless sensor network mobile data collection method
CN108882195A (en) * 2018-06-20 2018-11-23 天津大学 Collaboration data collection method of the wireless sensor network based on mobile destination node
CN112702688A (en) * 2020-07-01 2021-04-23 南京林业大学 Mobile car planning method combining energy supplement and data collection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106452513A (en) * 2016-10-09 2017-02-22 华侨大学 Mobile energy supplementation method in delay-constrained wireless sensor network
CN107105467A (en) * 2017-05-16 2017-08-29 河海大学常州校区 A kind of High Availabitity wireless sensor network mobile data collection method
CN108882195A (en) * 2018-06-20 2018-11-23 天津大学 Collaboration data collection method of the wireless sensor network based on mobile destination node
CN112702688A (en) * 2020-07-01 2021-04-23 南京林业大学 Mobile car planning method combining energy supplement and data collection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李奇真: "无线网络中基于马尔可夫决策过程的资源管理研究", 《中国博士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113825101A (en) * 2021-11-24 2021-12-21 浙江师范大学 Charging trolley track design method based on heterogeneous wireless sensor network
CN113825101B (en) * 2021-11-24 2022-02-15 浙江师范大学 Charging trolley track design method based on heterogeneous wireless sensor network

Also Published As

Publication number Publication date
CN113365222B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
Han et al. A joint energy replenishment and data collection algorithm in wireless rechargeable sensor networks
CN112437131B (en) Data dynamic acquisition and transmission method considering data correlation in Internet of things
Sangaiah et al. SALA-IoT: Self-reduced internet of things with learning automaton sleep scheduling algorithm
CN109862532B (en) Rail transit state monitoring multi-sensor node layout optimization method and system
CN101459914A (en) Wireless sensor network node coverage optimization method based on ant colony algorithm
Zhong et al. Ant colony optimization algorithm for lifetime maximization in wireless sensor network with mobile sink
Anagnostopoulos et al. Predictive intelligence to the edge through approximate collaborative context reasoning
CN113365222B (en) Mobile sensor intelligent track design method based on sustainable data acquisition
CN112817327A (en) Multi-unmanned aerial vehicle collaborative search method under communication constraint
CN109041073B (en) Self-powered wireless sensor network optimal node placement method
Han et al. Dynamic collaborative charging algorithm for mobile and static nodes in Industrial Internet of Things
Han et al. A trajectory planning algorithm for data collection in UAV-aided wireless sensor networks
CN110049500B (en) UAV energy compensation method in wireless chargeable sensor network based on simulated annealing algorithm
CN109413746B (en) Optimized energy distribution method in communication system powered by hybrid energy
CN106257849A (en) Frequency spectrum sensing method based on multi-target quantum Lampyridea search mechanisms
CN116340737A (en) Heterogeneous cluster zero communication target distribution method based on multi-agent reinforcement learning
CN115395502A (en) Photovoltaic power station power prediction method and system
Lalle et al. A hybrid optimization algorithm based on K-means++ and Multi-objective Chaotic Ant Swarm Optimization for WSN in pipeline monitoring
Mahamat et al. A deep reinforcement learning-based context-aware wireless mobile charging scheme for the internet of things
Jia et al. Long-term energy collection in self-sustainable sensor networks: A deep Q-learning approach
CN114080026A (en) Underwater wireless sensor network resource allocation method based on random gradient descent
Li et al. UAV-assisted 3D Trajectory Planning and Data Collection in Wireless Powered IoT
Press et al. DSP: a deep learning based approach to extend the lifetime of wireless sensor networks
Feng et al. Intelligent Trajectory Design for Mobile Energy Harvesting and Data Transmission
Tokle et al. Energy-efficient wireless sensor networks using learning techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant