CN112867023B

CN112867023B - Method for minimizing perception data acquisition delay through dynamic scheduling of unmanned terminal

Info

Publication number: CN112867023B
Application number: CN202011613662.8A
Authority: CN
Inventors: 刘驰; 戴子彭
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-11-19
Anticipated expiration: 2040-12-30
Also published as: CN112867023A

Abstract

The invention provides a method for minimizing perception data acquisition time delay by dynamically scheduling an unmanned terminal, which comprises the following steps: step 1, a base station main process sets a distributed prior experience multiplexing pool and initializes the data acquisition time delay of all sensors; step 2, the base station starts a subprocess to simulate the track data acquisition delay of the unmanned terminal and uploads the delay to a distributed prior experience multiplexing pool; 3, solving the optimal track of the minimized sensing data acquisition time delay by the base station by adopting a quantile error matrix algorithm based on a distributed prior experience multiplexing pool; and 4, the base station sends an optimal track dynamic transferring instruction to the unmanned terminal to obtain the latest data of the sensor in the sensing area. According to the technical scheme, the base station can dynamically schedule the unmanned terminal, and the unknown condition in the mobile crowd sensing technical scene is quickly explored, so that the strategy optimization speed is accelerated, and the effect of minimizing the sensing data acquisition time delay is improved.

Description

Method for minimizing perception data acquisition delay through dynamic scheduling of unmanned terminal

Technical Field

The invention belongs to the field of mobile unmanned crowd sensing, and particularly relates to a method for minimizing sensing data acquisition delay through dynamically scheduling an unmanned terminal.

Background

The mobile crowd sensing technology supports the sensing data acquisition requirement in a smart city scene, and the technology utilizes crowd intelligence to collect a large amount of data in the city environment and has high practical application value on the problems of traffic road condition supervision, natural disaster early warning and the like. In recent years, unlike the conventional crowd sensing technology which takes a man-made center and a portable terminal (such as an iPhone and iWatch) as a tool, the mobile unmanned crowd sensing technology which takes a mobile unmanned terminal (such as an unmanned aerial vehicle and an unmanned vehicle) as a center can provide sensing data acquisition service with wider application and higher data quality for a smart city. As shown in fig. 1, sensor nodes that can generate different types of data, such as monitoring cameras, WIFI routing, and temperature sensors in a building, are distributed in a smart city. Meanwhile, a new generation of unmanned terminals carries multiple smart antennas, so that the unmanned terminals have the capability of collecting data from multiple sensor nodes at a higher transmission rate at the same time, which is difficult to realize by the traditional crowd sensing technology alone. However, in the mobile unmanned crowd-sourcing perception technology taking a mobile unmanned terminal as a core, most scientific research results only aim at planning terminal tracks, and try to find a behavior mode capable of improving perception data acquisition quantity as much as possible while minimizing energy consumption. It is noted that many crowd sensing tasks in real-world scenarios require data to be acquired and uploaded to a data center in as real time as possible, such as environmental monitoring and traffic control, and therefore the data acquisition process in such scenarios is usually time-efficient, which needs to be maintained by dynamically scheduling the unmanned terminal. The currently common data acquisition delay minimization algorithm based on a random process and the expectation maximization algorithm based on a hidden Markov model can only deal with the scheduling problem of a single unmanned terminal, only serve a small number of nodes, assume that each sensor has the same channel gain, and meanwhile, the transmission success rate of data is based on the random process under one gamma distribution. The method cannot guarantee the minimum time delay of the sensing data acquisition under the mobile unmanned crowd sensing scene.

The current chinese patent application number is: 202010374780.1, an unmanned plane group path planning method is disclosed, firstly, information of unknown environment is required to be obtained, including the positions of a starting point and a target point, the coordinates of obstacles, possible radar and missile risks, etc.; meanwhile, the unmanned aerial vehicle group also needs to consider the performances of the unmanned aerial vehicle when planning a path, such as a deflection angle, a pitch angle, flight height and the like; on the basis, the unmanned aerial vehicle selects a preferred path through calculation of a particle swarm optimization algorithm, and path planning of the whole unmanned aerial vehicle cluster is achieved. The invention can realize the path planning of the unmanned aerial vehicle cluster in the intensive risk environment, so that the unmanned aerial vehicle cluster can fly efficiently and cooperatively complete tasks. The invention relates to a path planning method around an unmanned aerial vehicle cluster, and develops the research of the unmanned aerial vehicle cluster path planning method from the aspects of improving the safety and the high efficiency of an air traffic system, which has important significance for ensuring the flight safety of the unmanned aerial vehicle, reducing the flight cost, increasing the airspace capacity and improving the operation efficiency of the air traffic system. However, a method for minimizing the sensing data acquisition delay by dynamically scheduling unmanned terminals such as unmanned aerial vehicles and unmanned vehicles is not provided.

And Chinese patent application numbers are as follows: 2020108210037, discloses a method and a device for planning unmanned aerial vehicle area track and a readable storage medium. The method comprises the following steps: acquiring a flight area of the unmanned aerial vehicle; judging whether the flight area is a concave polygonal area or not; if the flight area is a concave polygonal area, dividing the concave polygonal area into a transverse flight area and a longitudinal flight area which are perpendicular to each other; respectively planning the area tracks of a transverse flight area and a longitudinal flight area; and connecting the track points of the area track to generate a planned flight track. By implementing the method, the judgment of the concavity and the convexity of the planning area is carried out, and the flight areas are divided, so that the generation of interval areas is avoided, the problem that the existing direct back-and-forth flight is not correctly planned for the navigation track of the unmanned aerial vehicle is solved, the tracks of the flight areas in different areas are connected, the track planning is completed, the optimal flight track is planned for the concave polygonal area, the operation efficiency is improved and the operation time is saved when the unmanned aerial vehicle normally operates. Also, a method for minimizing the sensing data acquisition delay by dynamically scheduling unmanned terminals such as unmanned aerial vehicles and unmanned vehicles is not provided.

Disclosure of Invention

Aiming at the defect blank in the prior art, the invention provides a method for minimizing the perception data acquisition time delay by dynamically scheduling an unmanned terminal.

The technical scheme for solving the problems is as follows:

the method for minimizing the perception data acquisition delay through the dynamic scheduling of the unmanned terminal comprises the following steps:

step 1, a base station main process sets a distributed prior experience multiplexing pool and initializes the data acquisition time delay of all sensors;

step 2, the base station starts a subprocess to simulate the track data acquisition delay of the unmanned terminal and uploads the delay to a distributed prior experience multiplexing pool;

3, solving the optimal track of the minimized sensing data acquisition time delay by the base station by adopting a quantile error matrix algorithm based on a distributed prior experience multiplexing pool;

and 4, the base station sends an optimal track dynamic transferring instruction to the unmanned terminal to obtain the latest data of the sensor in the sensing area.

Further, the step 1 of setting up a distributed preferential experience multiplexing pool and initializing data acquisition delays of all sensors by the base station master process includes:

an empty distributed priority experience multiplexing pool is established on a base station of a mobile crowd sensing technology scene, then a plurality of subprocesses are established, and parameters of a simulation environment in each subprocess are initialized, wherein the parameters of the simulation environment comprise the position of an unmanned terminal, the electric quantity of the unmanned terminal, the position of a sensor and the data acquisition time delay of each sensor.

Further, the step 2 of uploading the base station starting subprocess simulation unmanned terminal trajectory data acquisition delay to the priority experience multiplexing pool includes:

step 201, the base station starts each subprocess, each subprocess starts a new round and monitors and maintains a self-simulation environment, each step of the unmanned terminal adopts an element-greedy strategy, and the base station dynamically schedules all the unmanned terminals in the environment according to dynamically selected element parameters;

202, simulating the change of the unmanned terminal track and the sensor sensing data acquisition time delay in respective simulation environments by each subprocess in an asynchronous execution mode, and when detecting that the unmanned terminal collides with an obstacle or runs out of electric quantity in a certain round, immediately ending the round of the subprocess and re-initializing the simulation environment parameters of the subprocess;

otherwise, in the current time t, the unmanned terminal observes the current situation state s_tPerforming a move and data acquisition action a_tMoving to the position of the sensor needing to reset the data acquisition time delay in the current environment, resetting the current data acquisition time delay of the sensors to be 1, automatically accumulating the data acquisition time delays of other sensors, and transmitting the data acquisition time delays according to the data transmission timesCalculating the current situation reward r according to the change of the data acquisition time delay on the sensor_tThen the situation state s of this sub-process is calculated_tThis movement and data acquisition action a_tAnd the calculated current situation award r_tAnd sending the data to a distributed preferential experience multiplexing pool.

Further, the step 3 of the base station solving the optimal trajectory of the minimized sensing data acquisition delay by using a quantile error matrix algorithm based on the prior empirical multiplexing pool includes:

step 301, the base station judges whether the distributed priority experience multiplexing pool collects experience data of 10% of the storage space, and if not, each sub-process needs to adopt an asynchronous execution mode to obtain the experience data of 10% of the storage space;

step 302, when experience data meeting 10% of space exists in the distributed priority experience multiplexing pool, sampling batch experience data in the distributed priority experience multiplexing pool according to the existing priority w;

step 303, the main process of the base station calculates the current best action according to the batch empirical data and the first calculation model

The first calculation model is as follows (1):

in the above-mentioned formula (1),

to satisfy a uniformly distributed random multiplier between 0 and 0.25,

for value sampling, G is the sampling frequency, and argmax represents the operation of selecting the vector element with the maximum expected value;

step 304, the base station will perform the best action currently

Sending the data to the unmanned terminal in the simulation environment, wherein the unmanned terminal acts according to the current best action

Updating to the next situation state s_t+1The main process of the base station calculates the next best action according to the first calculation model

305, the main process of the base station according to the current situation state s_tThis movement and data acquisition action a_tAnd the next situation state s_t+1Next best action

Calculating each element TD of the error matrix TD according to a second model_ijThe second calculation model is as follows (2):

in the above formula (2), TD_ijFor each element of the error matrix TD, r_tThe prize is awarded for the current situation,

for independent uniform sampling of the current local distribution,

independent uniform sampling for next local distribution, wherein gamma is an attenuation factor;

step 306, the main process of the base station updates the parameters by a gradient descent method of a third calculation model, wherein the third calculation model is as the following formula (3):

in the above-mentioned formula (3),

huber is a common quantile error calculation function in combinatorial mathematics as an objective function;

307, the main process of the base station passes each element TD of the error matrix TD_ijAnd updating the priority w of the experience multiplexing pool of priority by the fourth calculation model_tThe fourth calculation model is as follows (4):

in the above formula (4), w_minIs a given lower priority limit;

step 308, the main process of the base station initializes the simulation environment and multiplexes the priority w of the pool based on the optimized priority experience_tAnd outputting the optimal action of each step of the unmanned terminal, outputting an action sequence executed by the unmanned terminal when the unmanned terminal executes the T-step action, and outputting the action sequence executed by the unmanned terminal to be an optimal track for minimizing the current environment perception data acquisition time delay.

Further, the step 4 of sending an optimal trajectory dynamic maneuver instruction to the unmanned terminal by the base station to acquire the latest data of the sensor in the sensing area includes:

in a mobile unmanned crowd sensing technical scene, a base station sends a series of control instructions to an unmanned terminal according to the currently calculated optimal track, the unmanned terminal is scheduled to continuously acquire the latest data of the sensors in the sensing area through dynamic movement, and the data are sent back to the base station.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention discloses a method for minimizing perception data acquisition delay by dynamically scheduling an unmanned terminal, which uses a mobile unmanned terminal carrying a plurality of intelligent antennas to continuously update the data acquisition state of each node of the whole sensor network through reciprocating movement and data acquisition.

2. The method for minimizing the sensing data acquisition time delay by dynamically scheduling the unmanned terminal is based on a data uploading model under the multi-user multi-antenna communication technology, and models the whole problem by adopting a mode of minimizing the data acquisition time delay of all sensors, thereby ensuring the real-time performance of the sensing data acquisition of the sensor network.

3. According to the method for minimizing the perception data acquisition delay through the dynamic scheduling unmanned terminal, the practical experimental verification shows that a plurality of unmanned terminals which take a 'DRL-freshMCS' for minimizing the perception data acquisition delay through the dynamic scheduling unmanned terminal as a scheduling algorithm are used as base stations, compared with the situation that a large number of fixed base stations are deployed, the method has a better perception data acquisition delay optimization effect, and can be widely used in scenes with large area, complex channels and difficult interference modeling.

4. According to the method for minimizing the perception data acquisition delay through the dynamic scheduling of the unmanned terminal, the experience is collected by constructing a plurality of asynchronous processes under different E-greedy strategies, and then the data used for training the neural network is sampled based on the priority.

Drawings

Fig. 1 is a detailed view of a mobile unmanned crowd sensing scenario in which an unmanned terminal is used as a temporary base station of a sensor network;

FIG. 2 is a schematic diagram of the DRL-freshMCS algorithm of the deep reinforcement learning-based unmanned terminal dynamic scheduling algorithm of the present invention;

fig. 3 is a schematic diagram illustrating the influence of the number of antennas of the method for minimizing the perceptual data acquisition delay by dynamically scheduling an unmanned terminal according to the present invention on the perceptual data acquisition delay;

fig. 4 is a schematic diagram illustrating an influence of an upload data packet size on perceptual data acquisition delay according to the method for minimizing perceptual data acquisition delay by dynamically scheduling an unmanned terminal according to the present invention;

FIG. 5 is a schematic diagram illustrating the influence of the number of sensor nodes on the sensing data acquisition delay according to the method for minimizing the sensing data acquisition delay by dynamically scheduling an unmanned terminal according to the present invention;

FIG. 6 is a schematic diagram illustrating the influence of the number of unmanned terminals on the acquisition delay of sensing data according to the method of the present invention;

fig. 7 is a schematic diagram of a minimization process of sensing data acquisition delay in different sub-regions and a schematic diagram of a trajectory of an unmanned terminal within a period of time according to the method for minimizing sensing data acquisition delay by dynamically scheduling the unmanned terminal of the present invention;

fig. 8 is a schematic diagram of a minimization process of sensing data acquisition delay in a complete round by the method for minimizing sensing data acquisition delay by dynamically scheduling an unmanned terminal according to the present invention.

Detailed Description

The method for minimizing the perceptual data acquisition delay by dynamically scheduling the unmanned terminal according to the present invention is further described in detail with reference to fig. 1 to 8 of the specification.

With reference to fig. 1, under limited electric quantity, a mobile unmanned terminal carrying a plurality of smart antennas is scheduled to minimize data acquisition delay of each sensor node by a mobile and data acquisition method, so that data real-time performance of a sensor network is finally ensured; the method and the device construct a data uploading model from a plurality of sensors to the unmanned terminal based on communication knowledge of a multi-user multi-antenna technology, and provide more accurate value estimation and more stable strategies for track optimization of the mobile unmanned terminal and selection of the sensor to be uploaded by using the implicit quantile network, so that the optimization effect of sensing data acquisition time delay is further improved.

With reference to fig. 2, the method for minimizing the perceptual data acquisition delay by dynamically scheduling an unmanned terminal according to the present invention includes:

step 1, a base station main process sets a distributed prior experience multiplexing pool and initializes the data acquisition time delay of all sensors:

establishing an empty distributed priority experience multiplexing pool on a base station of a mobile unmanned crowd sensing technology scene, then establishing a plurality of subprocesses, and initializing each subprocessSpecifically, with reference to fig. 1, a mobile unmanned crowd sensing scene requiring data real-time is established in the embodiment, M unmanned terminals are deployed as mobile base stations, each unmanned terminal is configured with U intelligent antennas, and continuously moves to acquire data in N randomly distributed single-antenna sensor nodes, and simultaneously continuously resets current data acquisition time delays of some sensor nodes; in addition, a high-altitude building is defined as an obstacle, and an unmanned terminal is avoided, so that two sets mainly exist in a scene, and the unmanned terminal set

And sensor set

When data are collected, a data transmission model uploaded to a single unmanned terminal base station from a plurality of sensor nodes in a multi-user multi-antenna technology (MU-MIMO) mode is established, corresponding data transmission rate is obtained, in order to improve the authenticity of the invention, a two-dimensional plane coordinate system is established, wherein the position of each unmanned terminal m at t time is represented as

Plane vector

With a fixed height h, while the position of each fixed sensor node is denoted pⁿ＝[xⁿ,yⁿ,0]The distance between the unmanned terminal m and the sensor n is expressed as a time-dependent function

The embodiment divides the whole mobile crowd-sourcing perception task into T discrete time steps, each having the same time length tau_totalWhole τ_totalIs divided into two parts of activities: the unmanned terminal moves and acquires sensor data (data uploading), and each unmanned terminal m is along the direction in each time t

Distance of movement

At a speed of

Therefore, the time required for moving is obtained

As such, the movement trajectory of each unmanned terminal m is represented as T continuous line segments, and then the unmanned terminal may use the remaining time

To act as a temporary base station, collecting data from surrounding sensors and resetting the sensory data acquisition delays of these sensors, as shown in step 2.

Step 2, the base station starts a subprocess to simulate the track data acquisition delay of the unmanned terminal and uploads the delay to a distributed prior experience multiplexing pool: step 201, the base station starts each subprocess, each subprocess starts a new round and monitors and maintains a self-simulation environment, each step of the unmanned terminal adopts an element-greedy strategy, and the base station dynamically schedules all the unmanned terminals in the environment according to dynamically selected element parameters;

202, simulating the change of the unmanned terminal track and the sensor sensing data acquisition time delay in respective simulation environments by each subprocess in an asynchronous execution mode, and when detecting that the unmanned terminal collides with an obstacle or runs out of electric quantity in a certain round, immediately ending the round of the subprocess and reinitializing self simulation environment parameters;

otherwise, in the current time t, the unmanned terminal observes the current situation state s_tPerform a movement anddata acquisition action a_tMoving to the position of the sensor needing to reset the data acquisition time delay in the current environment, resetting the current data acquisition time delay of the sensors to be 1, automatically accumulating the data acquisition time delays of other sensors, and calculating the current situation reward r according to the change of the data acquisition time delay on each sensor_tThen the situation state s of the sub-process_tThis movement and data acquisition action a_tAnd the current situation award r_tAnd sending the data to a distributed preferential experience multiplexing pool.

Step 3, the base station adopts a quantile error matrix algorithm to solve the optimal track of the minimum sensing data acquisition time delay based on a distributed prior experience multiplexing pool:

The first calculation model is as follows (1):

in the above-mentioned formula (1),

to satisfy a uniformly distributed random multiplier between 0 and 0.25,

for value sampling, G is the number of samples, argmax represents the selection periodVector element operation with the largest expectation value;

step 304, the base station will perform the best action currently

for independent uniform sampling of the current local distribution,

in the above-mentioned formula (3),

in the above formula (4), w_minIs a given lower priority limit;

And 4, the base station sends an optimal track dynamic transfer instruction to the unmanned terminal to acquire the latest data of the sensor in the sensing area, so that the minimum of the sensing data acquisition time delay in the whole sensor network is realized:

in a mobile unmanned crowd sensing technical scene, a base station sends a series of control instructions to an unmanned terminal according to the currently calculated optimal track, the unmanned terminal is scheduled to continuously acquire the latest data of sensors in a sensing area through dynamic movement, and the data is sent back to the base station, so that the minimization of the sensing data acquisition time delay in the whole sensor network is realized;

specifically, in a simulation experiment, the embodiment constructs a square industry with a side length of 1.6kmA service scene, which defines the initial data acquisition time delay of each sensor node as

Adding 1 to the current data acquisition delay of each sensor node every time step; in addition, when the unmanned terminal successfully collects data of a plurality of specified sensors within a time step t, the current data acquisition time delay of the nodes is automatically reset to 1. The embodiment presets the initial position of each unmanned terminal m at

m and flying at fixed height h 130m, speed v_t25m/s, it is noted that the battery capacity of each unmanned terminal does not exceed E_max1100kJ, this embodiment contains T250 time steps, each step containing τ_total20 seconds;

in the implementation process of the algorithm in this embodiment, the number of different steps is set to be P-32, and the e-greedy policy of each process is based on the gaussian distribution

Selecting from samples an element, setting the maximum capacity of the priority empirical multiplexing pool to be 10⁶In order to ensure the stability of the gradient descent algorithm, the embodiment uses Adam optimizer with a clipping factor of 40, and in addition, the deep neural network used for training is the same as that used by the DQN training Atari game, while the LSTM uses 512 neurons in accordance with the fully connected layer.

The embodiment uses the sensing data acquisition time delay of the whole sensor network

As an evaluation index, it is specifically defined that when the whole mobile unmanned crowd sensing task is completed (T time steps are traversed), the time delay is obtained through the average data of all sensor nodes, in this embodiment, the sequence length of the LSTM is set to 20, and implicit division is performedThe bit number sample size is 64.

In the comparative experiment described later with reference to fig. 3 to 6, the number U of antennas on the unmanned terminal, the size D of each data packet uploaded by the sensor, the number N of sensor nodes, and the number M of the unmanned terminals are all sequentially used as variables.

The final results of the following algorithms are evaluated in detail below, and compared using the following six reference algorithms:

DRL-freshMCS w/o E-DPER: compared with the self-improvement scheme, the priority empirical multiplexing mechanism is not used in the training, the priority sampling is replaced by the general random sampling, and the rest parts are the same as the invention.

DRL-CEWS: a reinforcement learning framework adopting a distributed computing mechanism and a curiosity exploration mechanism is the best method for mobile crowd-sourcing perception by using an unmanned terminal at present.

A-TP: based on DQN, the method is the best method for directly minimizing the acquisition delay of the sensing data at present.

GA: namely, a genetic algorithm, abstracts the problem of trajectory planning and task allocation into an optimization problem and solves the problem by adopting a linear programming mode.

Random: and each unmanned terminal m selects the behavior in a random mode.

Theoretical optimum value: in order to verify the feasibility of a scheme of taking a mobile unmanned terminal as a base station, the fixed base stations with the quantity far more than that of the unmanned terminal are adopted to realize the full coverage of a task area, so that a feasible scheme of obtaining the minimum perception data acquisition time delay of a theory is obtained.

In this embodiment, four sets of simulation tests are performed in total, the number U of antennas on the unmanned terminal, the size D of each data packet uploaded by the sensor, the number N of sensor nodes, and the number M of the unmanned terminal are used as independent variables, and the dependent variables are the aforementioned evaluation indexes, that is, the sensing data acquisition delay is obtained

As shown in fig. 3, this embodiment shows the influence of the number of smart antennas mounted on the unmanned terminal on the sensing data acquisition delay, where the size of the fixed upload data packet is D-5 Mbits, the number of sensor nodes N is 256, the number of unmanned terminals M is 2, and the variation range of the number of antennas of the unmanned terminal is set to be U-5 to U-25.

The effects summarized in this example are as follows:

DRL-freshMCS is better than the other five comparison algorithms in the aspect of sensing data acquisition delay (the lower the better). For example, when each unmanned terminal is equipped with 15 antennas, the sensing data acquisition delay of the DRL-freshMCS reaches 8.267, which is improved by 20% compared with 10.316 of the DRL-CEWS, and the DRL-freshMCS are respectively improved by 18%, 15%, 27%, 75% and 81% compared with the DRL-freshMCS w/o e-DPER, DRL-CEWS, a-TP, GA and Random in terms of average performance of the sensing data acquisition delay, and moreover, the sensing data acquisition delay of the DRL-freshMCS is also closest to the theoretical optimal value compared with other algorithms.

2. According to the embodiment, the sensing data acquisition time delay of most methods is monotonically reduced along with the increase of the deployment number of the intelligent antennas, because according to the MU-MIMO theory, the more the number of the antennas is, the more the sensor nodes capable of acquiring data are, and the sensing data acquisition time delay of a large-scale sensor network is directly beneficial to minimizing; on the other hand, more antennas can directly increase the transmission speed of each sensor on the MIMO channel, and although the speed has a theoretical limit value, the higher channel transmission speed can greatly save the time for transmitting data with a fixed size

Thereby increasing the time for the unmanned terminal to move within each time step t

The unmanned terminal has more time to search more sensor nodes which are not collected with data, and therefore the sensing data acquisition time delay of the whole system can be effectively minimized.

For the above five comparison methods, although the sensing data acquisition delay pages of the five comparison methods are all reduced along with the increase of the number of antennas, the differences from the DRL-freshMCS exist in different degrees, which benefits from the stability of the strategy brought by IQN, so that the training and testing results of the neural network are highly consistent, for the GA method, the goal is to firstly traverse all sensor nodes, and a solution process is established by using a method similar to the method for solving the "traveling salesman problem", but this also causes the sensing data acquisition delay of the above method to be very high, because the data uploading model in the MU-MIMO theory is not taken into consideration.

As shown in fig. 4, this embodiment respectively shows the influence of the size of the sensor upload packet on the sensing data acquisition delay, where the number of fixed unmanned terminal antennas is 10, the number of sensor nodes N is 256, the number of unmanned terminals M is 2, and this example sets the variation range of the size of the sensor upload packet to be D2 Mbits to D10 Mbits.

The technical effects summarized in the embodiment are as follows:

the average performance of the DRL-freshMCS is better than that of the other five comparison algorithms, for example, when the sensor upload packet size D is 8Mbits, the sensing data acquisition latency of the DRL-freshMCS reaches 21.16, 29.51 of the best comparison algorithm DRL-CEWS reaches 29% improvement, and the DRL-freshMCS is improved by 22%, 17%, 25%, 71%, and 75% compared with the DRL-freshMCS w/o e-DPER, DRL-CEWS, a-TP, GA, and Random, respectively, in terms of the average performance of the sensing data acquisition latency.

It can be seen that, with the increase of D, the sensing data acquisition delay of each method basically increases monotonically, because when D is very high, the sensor node needs to upload a very large data packet to update its latest state, that is, each unmanned terminal needs more acquisition time

Minimizing perceptual data acquisition latency with higher channel transmission rates, which results in time for an unmanned terminal to look for an unvisited sensor

Less, the sensing data acquisition time delay of the sensor directly influencing a remote area cannot be effectively optimized.

The embodiment can observe that the performance of the DRL-CEWS in the current best mobile crowd-sourcing perception method in the business scene is weaker than that of the DRL-freshmCs, because the DRL-CEWS is based on PPO, namely an on-policy reinforcement learning algorithm is adopted, and the lack of the experience multiplexing pool causes that the experience traversed in the past is difficult to be effectively used in the complex decision problem, although the training speed is accelerated due to less operation, the model is easy to converge to a poorer local optimal solution. In the business scene, the DRL-CEWS is good at track optimization tasks, and important tasks for minimizing data acquisition delay of all sensor nodes are also available, the state and action space are huge, accurate value estimation is important, and an IQN and distributed priority experience multiplexing mechanism needs to be introduced.

As shown in fig. 5, this embodiment shows the influence of the number of sensor nodes in the network, where the size D of the fixed sensor upload data packet is 5Mbits, the number U of the unmanned terminal antennas is 10, and the number M of the unmanned terminals is 2, and the number N of the sensor nodes is set as 32, 64, 128, 256, and 512, respectively, from fig. 5, it can be observed that the performance of the DRL-freshMCS is better than that of the other methods, and the sensing data acquisition delay of most methods increases monotonically with the increase of N, because under the limited channel transmission rate and battery power, more sensor nodes will bring a larger workload to the unmanned terminal mobile base station, on the contrary, when N is 32, the sensors in the task area are very rare, and there are usually only 1 to 2 sensor nodes near the unmanned terminal, and the advantage of using multiple antennas will not be obvious, the problem becomes simpler, so most methods except random will perform more closely at this time, and furthermore, when N is 256, the DRL-freshMCS method proposed by the present invention still has the perceptual data acquisition delay closest to the theoretical optimal value.

As shown in fig. 6, this embodiment shows the influence of the number of the unmanned terminals, when the fixed sensor upload packet size D is 5Mbits, the number of the unmanned terminal antennas U is 10, the number of the sensor nodes N is 256, while the number M of the unmanned terminals set in this embodiment may be from 2 to 30, it can be observed from fig. 6 that the DRL-freshMCS still has better perceptual data acquisition delay than other comparison methods, and in addition, the gap between the DRL-freshMCS and the theoretical optimal value becomes smaller and smaller as the number of the deployed unmanned terminals increases, this is because a large number of unmanned terminals can cover a very large number of different sensor nodes within the same time step t, and also saves time for the terminals to move around, therefore, the sensing data acquisition delay of the whole mobile crowd-sourcing sensing scene is effectively kept to be continuously at a lower level.

With reference to fig. 7-8, this embodiment shows in detail an optimization process of the DRL-freshMCS method for obtaining the sensing data acquisition delay in a given time, where the size D of the fixed sensor upload data packet is 5Mbits, the number of the unmanned terminal antennas is U is 10, the number of the sensor nodes is N is 256, and the number of the unmanned terminals M is 2, as shown in fig. 7, this embodiment shows a simulation diagram of a moving trajectory of the unmanned terminal in the previous 30 time steps, and a change in the sensing data acquisition delay of all sensors in three specified sub-regions in one round. It can be observed from the right side of fig. 7 that the unmanned terminal dynamically scheduled by DRL-freshMCS can already realize dynamic coverage for most sensors in the map, and it can be observed from the left side of fig. 7 that each sensor in different sub-areas can be simultaneously reset by the unmanned terminal with a certain period until the current data acquisition delay reaches 1, and the reset period of the sub-area which is usually far away is longer, and the reset period of the area close to the center is shorter. This shows that the DRL-freshMCS method can flexibly and effectively make dynamic scheduling for the unmanned terminal according to different positions of the sensor, and effectively reduce energy consumption caused by a large amount of movement while minimizing data acquisition delay, and prolong the life cycle of the whole system.

As shown in fig. 8, this embodiment further shows an optimization process of the dynamic scheduling algorithms of different unmanned terminals on the sensing data acquisition delay in a complete round, where a curve located at the lowest part and closest to a theoretical optimal value is the DRL-freshMCS method of the present invention, and it can be observed that the sensing data acquisition delay of the entire sensor network is maintained at a relatively stable value within 30 time steps and is maintained until the round is finished, which indicates that the DRL-freshMCS method can realize continuous minimization of the sensing data acquisition delay in the mobile unmanned crowd sensing task by dynamically scheduling the unmanned terminals, so as to stably maintain the strong real-time property of system data.

The present invention is not limited to the above-described embodiments, and any variations, modifications, and alterations that may occur to one skilled in the art without departing from the spirit of the invention are intended to be within the scope of the invention.

Claims

1. A method for minimizing perception data acquisition delay through dynamically scheduling an unmanned terminal is characterized by comprising the following steps:

The first calculation model is as follows (1):

in the above-mentioned formula (1),

to satisfy a uniformly distributed random multiplier between 0 and 0.25,

step 304, the base station will perform the best action currently

for independent uniform sampling of the current local distribution,

in the above-mentioned formula (3),

in the above formula (4), w_minIs a given lower priority limit;

step 308, the main process of the base station initializes the simulation environment and multiplexes the priority w of the pool based on the optimized priority experience_tOutputting best of unmanned terminal per stepThe method comprises the steps of action, outputting an action sequence when the unmanned terminal executes the T-step action, wherein the action sequence is an optimal track capable of minimizing the current environment perception data acquisition time delay;

2. The method for minimizing perceptual data acquisition latency by dynamically scheduling an unmanned terminal as defined in claim 1, wherein the step 1 of the base station master process setting up a distributed prioritized empirical multiplexing pool and initializing data acquisition latencies of all sensors comprises:

3. The method for minimizing perceptual data acquisition latency by dynamically scheduling an unmanned terminal as defined in claim 1, wherein the step 2 of uploading the base station startup subprocess simulated unmanned terminal trajectory data acquisition latency to the prioritized empirical multiplexing pool comprises:

202, simulating the change of the unmanned terminal track and the sensor sensing data acquisition time delay in respective simulation environments by each subprocess in an asynchronous execution mode, and when detecting that the unmanned terminal collides with an obstacle or runs out of electric quantity in a certain round, immediately ending the subprocess in the round and reinitializing self simulation environment parameters;

otherwise, in the current time t, the unmanned terminal observes the current situation state s_tPerform a movement anddata acquisition action a_tMoving to the position of the sensor needing to reset the data acquisition time delay in the current environment, resetting the current data acquisition time delay of the sensors to be 1, automatically accumulating the data acquisition time delays of other sensors, and calculating the current situation reward r according to the change of the data acquisition time delay on each sensor_tThen the situation state s of this sub-process is calculated_tThis movement and data acquisition action a_tAnd the calculated current situation award r_tAnd sending the data to a distributed preferential experience multiplexing pool.

4. The method for minimizing the sensing data acquisition delay through dynamically scheduling the unmanned terminal according to claim 1, wherein the step 4 of sending the optimal trajectory dynamic scheduling instruction to the unmanned terminal by the base station to acquire the latest data of the sensor in the sensing area comprises: