CN113805568A

CN113805568A - Man-machine cooperative perception method based on multi-agent space-time modeling and decision making

Info

Publication number: CN113805568A
Application number: CN202110943514.0A
Authority: CN
Inventors: 刘驰; 王宇; 朴成哲
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-12-17
Anticipated expiration: 2041-08-17
Also published as: CN113805568B

Abstract

The invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision making, which comprises the following steps: step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO; step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network; step 3, extracting global historical information from respective three-dimensional memory storage mapping; step 4, extracting context information which is critical to the current unmanned platform state; step 5, carrying out local updating on the three-dimensional memory storage mapping; step 6, finishing three-dimensional memory mapping Cubic Map by each unmanned platform; step 7, generating value estimation by using a value function, and executing generated actions by each unmanned platform to obtain reward values; step 8, repeatedly executing the steps 2-7; and 9, repeatedly executing the steps 1 to 8. The method has better sensing data acquisition effect, and can be widely applied to scenes with large area, complex environment and difficult communication.

Description

Man-machine cooperative perception method based on multi-agent space-time modeling and decision making

Technical Field

The invention relates to the technical field of mobile group perception, in particular to a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.

Background

The mobile group perception technology is a leading-edge research direction combining the Internet of things and artificial intelligence, a large number of mobile devices used by common users are used as basic perception units, the Internet of things and the mobile Internet are cooperated to realize perception task distribution and perception data collection and utilization, and large-scale and complex urban and social perception tasks are finally completed. However, mobile group perception systems based on mobile devices are often affected by various factors, such as uncertainty of user movement and quality problems of the mobile devices, and these factors may cause low quality of collected data and poor user satisfaction.

In addition to the mobile group perception technology taking people as the core, the rapid development of unmanned platform technologies such as unmanned aerial vehicles and unmanned vehicles is benefited, and the collection and propagation of perception data in urban environments by using the unmanned platforms are becoming practical.

Considering the coexistence of people and mobile unmanned platforms (such as unmanned express delivery vehicles for delivering express) in cities nowadays, the man-machine cooperative group perception can make up the quality problem caused by the group perception based on the people and the cost problem caused by the group perception based on the unmanned platforms, and the unmanned platforms are deployed to collect data from low-cost sensors distributed in buildings by fully utilizing the people in the cities, so that the data acquisition requirement under the smart cities can be better met.

However, in a real-world scenario, the technical challenges mainly faced by the man-machine collaborative mobile group perception technology are as follows:

technical challenge 1: the existing multi-agent learning technology based on centralized training cannot be used in a real environment, the existing multi-agent learning uses a centralized training mode to enable each agent to train the agent by utilizing respective local observation and global information obtained through information sharing, but because the area involved in information sharing is usually large and a plurality of obstacles for blocking communication signals exist, the centralized training cannot be carried out in the real environment.

Technical challenge 2: modeling of complex spatiotemporal information about moving people and dense obstacles in cities is difficult. In order to maximize the utilization of people in a city, it is necessary to consider that most people are uncontrollable, so the intelligent agent is required to plan its own data acquisition strategy according to the change of the space distribution of people with time.

Based on the problems in the prior art, the invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a human-computer cooperative perception method based on multi-agent space-time modeling and decision-making.

The invention adopts the following technical scheme:

a man-machine cooperative perception method based on multi-agent space-time modeling and decision making comprises the following steps:

step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample library of each unmanned platform, randomly initializing a data acquisition strategy of each unmanned platform, and starting a data acquisition task in a fully distributed mode in cooperation with people;

step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;

step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, and global history information is extracted from each three-dimensional memory Map by using global convolution reading operation;

step 4, each unmanned platform uses reading operation based on context cross-correlation based on the spatial features in respective local observation and the global history information extracted from respective three-dimensional memory storage mapping, and weights the information in the three-dimensional memory storage mapping according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;

step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;

step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space features in the current local observation, the global history information and the context information extracted from the respective three-dimensional memory mapping, and each unmanned platform finishes the three-dimensional memory mapping Cubic Map;

step 7, each unmanned platform uses a strategy function to generate actions and a value function to generate value estimation based on the characteristic vectors, and each unmanned platform executes the generated actions to obtain reward values;

step 8, repeatedly executing the steps 2-7 until the data acquisition task is finished, and optimizing a strategy function and a value function based on respective track data by each unmanned platform;

and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.

Further, step 1 comprises:

step 1.1, unmanned platform clustering

Empty sample library of each unmanned platform u

Random initialization parameter theta^u；

And step 1.2, initializing a time step t to be 0, and starting to interact with the human-computer cooperative crowd sensing environment.

Further, step 2 comprises:

step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state s_tEach unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space;

step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract eachSpatial features from local observations

Further, in step 3, the global historical spatiotemporal information is stored in a three-dimensional memory map

In the middle, global-based convolution read operations are used, all stored data are considered as a whole, and a convolution neural network is used to extract global information, as shown in the following formula (1):

in formula (1): phi is a^read(. cndot.) represents a convolutional neural network.

Further, in step 2.1, global state s^tFor a three-dimensional vector, the first two dimensions are associated with two-dimensional coordinates, f^sFor mapping the continuous coordinate values to discrete coordinate values,

successive coordinate values for the unmanned platform u at the current time step t, then

Where j controls the range of local observation.

Further, step 4 comprises:

step 4.1, using learnable parameter matrix

From current local spatial features

And global features

In which one is extracted by a convolution operationA query vector

The following formula (2):

in formula (2): denotes matrix multiplication, [; represents the concatenation of vectors;

step 4.2, calculating the query vector

And three-dimensional memory storage mapping

The cross correlation coefficient matrix between them, as shown in the following formula (3):

in formula (3): sigma denotes a sigmod activation function,

representing the calculation of cross-correlation coefficients;

step 4.3, use the cross correlation coefficient matrix

Mapping for three-dimensional memory storage

Weighting and generating a context vector by convolving the weighted results

The following formula (4):

in formula (4): f. of^c(. two-dimensional vector by third dimension replication of data

Expansion into a three-dimensional vector f^c(. cndot.) has the following formula (5):

in formula (5): an element-wise multiplication is indicated by an element.

Further, step 5 comprises:

step 5.1, store the mapping from three-dimensional memory

The cubic area to be updated is selected according to the current position of the unmanned platform

(X '× Y') determines the spatial granularity of the written eigenvectors;

step 5.2, using the learnable parameter matrix

From the input

And

in generating a reset gate vector by a convolution operation

The following formula (6):

step 5.3, using the learnable parameter matrix

From the input

And

generating an updated gate vector by a convolution operation

The following formula (7):

step 5.4, using the learnable parameter matrix

And

from the input

And

by convolution operation, using reset gates

Generating candidate vectors

The following formula (8):

step 5.5, updating the door

Integration

And candidate vectors

To give the following formula (9)

Step 5.6, use

To replace

In (1)

Generating a three-dimensional memory storage map for a next time step

Further, in step 6, the current spatial feature information is processed

Storing profiles

And contextual information

Performing a join operation to generate feature vectors using convolution on the join results

The following formula (10):

in formula (10): phi is a^output(. h) denotes a convolution operation, [;]representing the concatenation of vectors.

Further, step 7 comprises:

step 7.1, the unmanned platform u uses the feature vector

Separately inputting policy functions

And a cost function

Generating actions

And value estimation

Step 7.2, each unmanned platform u executes the action

Obtaining a prize value

And entering the next time step.

Further, step 8 comprises:

step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;

step 8.2, each unmanned platform u collects track data

And according to

Calculating a cumulative reward estimate

And advantage estimation

Calculating a cumulative reward estimate for a time step i

The following formula (11):

in formula (11): gamma epsilon [0, 1 is a discount factor, and the dominance estimation is calculated by using the GAE mode

The following formula (12):

in formula (12): lambda belongs to [0, 1]]Calculating a time differential offset for the discounting factor

The following formula (13):

step 8.3, each unmanned platform u is to

Slicing according to length K in time dimension, and adding the generated sequence sample into a sample library

Step 8.4, each unmanned platform u learns from the sample library in a batch mode

Collecting M sequence samples based on joint loss function in PPO

For parameter theta^uUpdate is performed, and then the next round is entered, wherein

Is a loss function of the policy function and,

is a loss function of the cost function,

is a regularization term that is related to the policy function,

the calculation formulas of (a) are as follows (14) to (16):

in formula (14): s is the policy entropy, c₁，c₂，∈₁，∈₂Are all constant.

Compared with the prior art, the invention has the following advantages:

1. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making is completely distributed in the training and testing stages, does not depend on any communication, can be easily applied to wide and complex scenes in space, solves the technical challenge that the existing multi-agent learning technology based on centralized training cannot be used in the actual scene, adopts FD-MAPPO as the training frame of the unmanned platform cluster, has better perception data acquisition effect compared with the existing multi-agent learning technology, and can be widely applied to scenes which are large in area, complex in environment and difficult to communicate;

2. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making adopts a three-dimensional memory storage mapping Cubic Map, uses an original storage structure, is matched with local writing operation according to positions to store long-term space-time sequence data, keeps the integrity of global overall space information while recording the internal detail information of local spaces, lays a foundation for better extracting the features in the long-term space-time sequence data, takes the designed global and context-based reading operation and output operation as an extraction method, can ensure the comprehensiveness and accuracy of feature extraction, simultaneously provides required local detail information, and solves the technical challenge of difficult modeling of complex space-time information about mobile crowds, dense obstacles and the like in cities.

Drawings

FIG. 1 is a schematic diagram illustrating a human-machine cooperative perception method based on multi-agent spatiotemporal modeling and decision-making in an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating an influence of the number U of unmanned platforms on data acquisition efficiency (λ) in the sensing method according to the embodiment of the present invention;

FIG. 3 is a diagram illustrating a data acquisition rate of the unmanned platform U pair according to the sensing method in the embodiment of the present invention

Schematic diagram of the effects of (1);

fig. 4 is a schematic diagram illustrating an influence of the number U of unmanned platforms on geographic fairness (ξ) in the sensing method according to the embodiment of the present invention;

fig. 5 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the synergistic factor (ζ) in the sensing method according to the embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the energy consumption rate (β) in the sensing method according to the embodiment of the present invention;

FIG. 7 is a diagram illustrating the utilization rate of the unmanned platform number U to the crowd according to the sensing method in the embodiment of the present invention

Schematic diagram of the effects of (1);

FIG. 8 is a diagram illustrating the crowd participation ratio ω to crowd utilization ratio of the sensing method according to an embodiment of the present invention

Schematic diagram of the effect of (c).

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments, it being understood that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.

Examples

As shown in fig. 1, the human-computer cooperative perception method based on multi-agent spatiotemporal modeling and decision includes:

step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample base of each unmanned platform, randomly initializing a data acquisition strategy (namely initializing a neural network parameter for decision), and starting a data acquisition task in a fully distributed mode in cooperation with a crowd;

step 1.1, unmanned platform clustering

Empty sample library of each unmanned platform u

Random initialization parameter theta^u；

Step 1.2, initializing a time step t to be 0, and starting to interact with a human-computer cooperative crowd sensing environment;

step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state st, and each unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space

Step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract the spatial features in each local observation

Step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, global historical information is extracted from each three-dimensional memory Map by using a global convolution reading operation, wherein the global historical space-time information is stored in the three-dimensional memory Map

In the middle, global-based convolutional read operations are used, all stored data are treated as a whole, and a convolutional neural network is used to extract global information:

wherein phi^read(. -) represents a convolutional neural network;

step 4, each unmanned platform extracts global history information from the respective three-dimensional memory storage mapping based on the spatial features in the respective local observation, and extracts the context information which is critical to the state of the current unmanned platform from the respective three-dimensional memory storage mapping by using the reading operation based on context cross-correlation, namely, the information in the three-dimensional memory storage mapping is weighted according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;

step 4.1, using learnable parameter matrix

From current local spatial features

And global features

Wherein a query vector is extracted by a convolution operation:

wherein denotes a matrix multiplication, [; represents the concatenation of vectors;

step 4.2, calculating the query vector

And three-dimensional memory storage mapping

Cross correlation coefficient matrix between:

where sigma denotes a sigmod activation function,

representing the calculation of cross-correlation coefficients;

step 4.3, use the cross correlation coefficient matrix

Mapping for three-dimensional memory storage

Weighting and generating a context vector by convolving the weighted results:

wherein f is^c(. to) two-dimensional vector by copying data in a third dimension

Expansion into a three-dimensional vector:

an by element multiplication;

step 5.1, store the mapping from three-dimensional memory

(X '× Y') determines the spatial granularity of the written eigenvectors;

step 5.2, using the learnable parameter matrix

From the input

And

generating a reset gate vector by a convolution operation:

step 5.3, using the learnable parameter matrix

From the input

And

generating an update gate vector by a convolution operation:

step 5.4, using the learnable parameter matrix

And

from the input

And

by convolution operation, using reset gates

Generating a candidate vector:

step 5.5, updating the door

Integration

And candidate vectors

Generating:

step 5.6, use

To replace

In (1)

Generating a three-dimensional memory storage map for a next time step

Step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space feature in the current local observation, the global historical information and the context information extracted from the respective three-dimensional memory storage mapping, each unmanned platform finishes the three-dimensional memory storage mapping Cubic Map, and the current space feature information is subjected to the Cubic Map

Storing profiles

And contextual information

Feature vectors are generated using convolution:

wherein phi^output(. -) represents a convolution operation;

step 7.1, the unmanned platform u uses the feature vector

Separately inputting policy functions

And a cost function

Generating actions

And value estimation

Step 7.2, each unmanned platform u executes the action

Obtaining a prize value

Entering the next time step;

step 8.2, each unmanned platform u collects track data

And according to

Calculating a cumulative reward estimate

And advantage estimation

Calculate the cumulative reward estimate for a certain time step i:

where γ ∈ [0, 1 is the discount factor, the dominance estimate is calculated using the GAE approach:

wherein λ ∈ [0, 1] is a discount factor, and the time difference deviation is as follows:

step 8.3, each unmanned platform u is to

Slicing according to K degrees in the time dimension, and adding the generated sequence samples into a sample library

Collecting M sequence samples based on the union in PPOSum loss function

Is a loss function of the policy function and,

is a loss function of the cost function,

is a regularization term that is related to the policy function,

the calculation formula of (a) is as follows:

where S is the policy entropy, c₁，c₂，∈₁，∈₂Are all constant;

In step 1 of the above embodiment, the method is used when the human-computer is cooperated with the data acquisition task

Represent an unmanned platform cluster, such thatBy using

Representing the population, in a round time range [0, T]In-person, unmanned platform cluster and crowd share low-cost sensor

In the collection of data, the unmanned platform cluster is usually restricted to flying below a certain altitude, for example in the united states, according to LAANC (low altitude authorization and notification capability), unmanned platforms can fly up to 120 meters in controlled airspace, all buildings are considered as obstacles that unmanned platforms cannot fly over due to different regulations on unmanned platform flying altitude in different regions, and furthermore, unmanned platform charging stations are deployed in cities such as parking lots

So as to lead the unmanned platform to go to supplement energy without loss of generality, a time slot system is adopted, namely the whole perception task is divided into equal T discrete time steps, all the unmanned platforms and the crowd move continuously in a two-dimensional environment, and each unmanned platform u can move a distance in any direction within the time step [ T, T + 1]

Wherein delta_maxIs the maximum moving distance calculated by the unmanned platform according to the maximum moving speed in a time step,

is the position of the unmanned platform u at the initial moment of the time step [ t, t +1), and each sensor p is provided with

The data volume is used for collecting unmanned platform clusters and crowds, and in each time step [ t, t + 1], if a certain sensor p is in the data perception range of a certain unmanned platform u and a certain person l, no sensor p is usedThe human platform u will collect

Individual data volume, person l will collect

An amount of data wherein

And

are constants that represent the amount of data that an unmanned platform and a person, respectively, can collect from a single sensor at most at a single time step,

the data quantity of the sensor p at the initial moment of the time step [ t, t +1) and the total data quantity collected by the unmanned platform u and the person l in the time step [ t, t +1) can be respectively represented as

And

wherein

And

respectively representing the sensors within the data perception ranges of the unmanned platform u and the person l in the time step [ t, t +1), each unmanned platform u is provided with a sensor in the beginning of the data acquisition task

An initial energy. At each time step [ t, t +1), the unmanned platform u will be consumed by the move

An energy, where η η is a mobile energy consumption factor, at the beginning of each time step [ t, t + 1], if the unmanned platform u is within the charging range of a certain charging station, and

then the unmanned platform u will be charged to

Energy, when a man-machine cooperative data acquisition task starts, each unmanned platform u empties a sample library

Random initialization parameter theta^uAnd setting the current time step t as 0, and enabling the unmanned platform cluster and the crowd to start interacting with the environment.

The simulation experiment of the above embodiment used a set of crowd movement trajectory data from the NCSU of university of america obtained from CRAWDAD, where there were 35 crowd movement trajectories in the NCSU, generated by 32 college students, using a GPS receiver to record their movement trajectories in their daily lives, where the GPS receiver would record the positions of selected students every 30 seconds during several hours to generate one trajectory, using google maps for marking map data, including the positions and shapes of buildings, lakes, and mountains, the north-south span of the NCSU being 1790.18 meters, the east-west being about 2028.70 meters, the floor area being about 363 ten thousand square meters, and similarly, 104 sensors were placed on 99 buildings

Randomly generating the data from 1GB to 1.5GB, setting the initial position of each unmanned platform as the central point of a scene, setting the maximum flight speed to be 12km/h, the initial energy to be e _0 to 20kJ, the mobile energy consumption factor eta to be 0.01kJ/m, and setting the data of the human platform and the unmanned platformThe sensing radius is 50 meters and 60 meters respectively, the rate of data acquisition from a single sensor is 8.3Mbps and 166.7Mbps respectively, and the charging radius of the charging station is 20 meters in consideration of the length of the cable.

In step 2 of the above embodiment, with respect to the observation of the global scope, the observation in the area with the unmanned platform as the center and a distance as the radius is called local observation;

in step 7 of the above embodiment, the actions generated by using the policy function are, for example: distances of movement along two coordinate axes in a two-dimensional coordinate system respectively; each unmanned platform performs the generated action to obtain a reward value, such as: when the obstacle is hit, a negative reward is given, and when the data is collected, a positive reward is given.

In order to further show the performance of the embodiment in the aspect of the human-computer cooperative mobile group perception task, a complete and thorough system test is carried out, and the specific evaluation form is 6 indexes of the system when a round is finished:

1. data acquisition rate

And the total data volume collected by all the unmanned platforms accounts for the total proportion of the initial data volume of the sensor.

2. Geographic fairness (ξ): and calculating the geographic fairness of the data acquired by all the unmanned platforms by adopting a Jain fairness index.

3. Cofactor (ζ): degree of synergy between unmanned platform clusters and populations.

4. Energy consumption rate (β): and the consumed energy accounts for the proportion of the sum of the initial energy and the supplementary energy of all the unmanned platforms in the movement of all the unmanned platforms.

5. Crowd utilization rate

The proportion of the data volume collected by the crowd under the ideal condition (without unmanned platform) is occupied by the actual data volume collected by the crowd, wherein

Representing a subset of the population.

6. Data acquisition efficiency (λ): the aim of the invention is to maximize the data acquisition rate

The geographic fairness (xi) and the synergistic factor (zeta) are simultaneously minimized, and the energy consumption rate (beta) is synthesized into an index:

in addition, the following 6 reference techniques were used for comparison:

1. FD-MAPPO (neural map): neural Map is an existing and advanced memory storage structure, and maintains a two-dimensional three-dimensional memory storage Map by using operations such as reading, writing, updating and outputting, and the Neural Map is matched by using the FD-MAPPO training framework provided by the invention in order to compare the Neural Map with the Cubic Map.

2. RPG: this is an existing, advanced multi-agent deep reinforcement learning technique that uses a reward random exploration technique to achieve better performance based on a policy gradient algorithm.

3. IPPO: the method is a multi-agent deep reinforcement learning technology based on a PPO algorithm, wherein each agent shares parameters.

4. PPO: the technology is an advanced single-agent deep reinforcement learning technology.

5. e-Divert: this is a multi-agent deep reinforcement learning technique based on the MADDPG algorithm, which uses a distributed prior experience pool and LSTM to obtain better performance, which is the most advanced multi-agent deep reinforcement learning crowd-sourcing sensing technique.

6. Random: each unmanned platform moves by adopting a random strategy.

The number U of unmanned platforms in the scene and the crowd participation ratio omega are respectively used as independent variables, and the dependent variables are the evaluation indexes, namely the data acquisition efficiency (lambda) and the data acquisition rate

Geographic fairness (ξ), co-factor (ζ), energy consumption rate (β) and crowd utilization rate

Two sets of tests were performed:

as shown in fig. 2, the learning framework consistently outperforms all other baseline techniques in terms of efficiency for the following reasons: PPO uses only one agent to control the behavior of multiple unmanned platforms, and may miss feedback from a particular unmanned platform, so that cooperation inside the unmanned platform cannot be sufficiently achieved; although e-river uses multiple agents, its deterministic strategy, DDPG, adopted is not good at action exploration, but is just crucial in the environment (e.g., in NCSU environments, the north and south campuses communicate through only two thin tunnels); IPPO performs far better than PPO and e-river because it uses multiple agents and a random strategy, however, parameter sharing across agents limits the potential of each unmanned platform to accurately capture observed features; RPG and FD-mappo (neural map) perform better than other baseline techniques, but still perform worse than FD-mappo (cubic map) because RPG explores using rewarding stochastic exploration techniques, but neglects spatio-temporal modeling of long trajectory sequences, in which respect the unmanned platform is easily disoriented and trapped in the obstacle; FD-mappo (neural map) tends to model spatio-temporal correlation by a two-dimensional map, but because the technique flattens the 3D tensor into a 1D tensor in the writing operation, the technique loses almost all spatial information in the modeling process;

as shown in FIGS. 2-5, the data acquisition efficiency (λ), data acquisition rate, of all methods when increasing the number of unmanned platforms

Both geographic fairness (ξ) and co-factor (ζ) will rise first and then gradually saturate as shown in fig. 6, with the use of more unmanned platforms, except e-Din addition to the vett technology, the energy consumption rate (β) of all the methods has an ascending trend and then tends to be saturated, because more unmanned platforms are used, so that corners and remote areas can be explored at an opportunity to collect richer data, but too many unmanned platforms do not bring additional benefits, so the energy consumption rate (β) tends to be saturated gradually, however, in contrast, the energy consumption rate (β) corresponding to the e-river technology is a descending trend and then tends to be saturated at first, and by checking the track of the unmanned platforms, the area explored by the newly added unmanned platforms is a corner with extremely few data due to the poor obstacle avoidance and exploration capacity of the e-river, so that the unmanned platforms are blocked by obstacles and stop moving at an early stage, and thus less energy is consumed;

as shown in fig. 7, the crowd utilization ratio

With the use of unmanned platforms increasing and with less unmanned platforms, the FD-mappo (cubic map) technique maintains relatively high crowd utilization by navigating unmanned platforms to those regions with few people

The highest efficiencies are achieved compared to FD-mappo (neural map) and RPG technologies, and as more unmanned platforms are deployed, they are also distributed to areas where sensors are densely deployed, and it is difficult to collect all data, although people may also be present in these areas, in order to collect as much data as possible. Crowd utilization if an unmanned platform intentionally bypasses the crowd to achieve a good 'cooperation' level

This increases, but moving long distances results in higher energy consumption rates and ultimately still results in reduced efficiency.

As shown in FIG. 8, the crowd utilization ratio can be observed

This is always going down because 25% of the people initially enabled are already enough, so adding more people does not bring additional benefit to the data collection task that the crowd does in cooperation with the unmanned platform, Random, e-Divert, PPO and IPPO technologies do not perform well but crowd utilization is poor

Are relatively high because they may explore areas where a crowd is present and where the crowd has collected almost all of the data, the present invention maintains relatively high crowd utilization compared to RPG and FD-mappo (neural map) technologies

This is because FD-MAPPO (Cubic map) reduces the probability of redundant data collection by human-machine, and when the number of unmanned platforms is small (U ≦ 4), the unmanned platforms will be navigated to areas where the crowd cannot collect all data alone.

The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.

Claims

1. A man-machine cooperative perception method based on multi-agent space-time modeling and decision making is characterized by comprising the following steps:

2. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein step 1 comprises:

step 1.1, unmanned platform clustering

Empty sample library of each unmanned platform u

Random initialization parameter theta^u；

3. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 2 comprises:

step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state s_tEach unmanned platform u obtains a corresponding local observation according to its position in global space

4. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 3, global historical spatiotemporal information is stored in a three-dimensional memory storage map

5. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 4 comprises:

step 4.1, using learnable parameter matrix

From current local spatial features

And global features

Wherein a query vector is extracted by a convolution operation

The following formula (2):

step 4.2, calculating the query vector

And three-dimensional memory storage mapping

in formula (3): sigma denotes a sigmod activation function,

representing the calculation of cross-correlation coefficients;

step (ii) of4.3 Using the Cross correlation coefficient matrix

Mapping for three-dimensional memory storage

Weighting and generating a context vector by convolving the weighted results

The following formula (4):

in formula (5): an element-wise multiplication is indicated by an element.

6. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 5 comprises:

step 5.1, store the mapping from three-dimensional memory

Determines the write eigenvectorSpatial granularity of (a);

step 5.2, using the learnable parameter matrix

From the input

And

in generating a reset gate vector by a convolution operation

The following formula (6):

step 5.3, using the learnable parameter matrix

From the input

And

generating an updated gate vector by a convolution operation

The following formula (7):

step 5.4, using the learnable parameter matrix

And

from the input

And

by convolution operation, using reset gates

Generating candidate vectors

The following formula (8):

step 5.5, updating the door

Integration

And candidate vectors

To give the following formula (9)

Step 5.6, use

To replace

In (1)

Generating a three-dimensional memory storage map for a next time step

7. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 6, the current spatial feature information is processed

Storing profiles

And contextual information

Feature vector generation using convolution

The following formula (10):

8. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 7 comprises:

step 7.1, the unmanned platform u uses the feature vector

Separately inputting policy functions

And a cost function

Generating actions

And value estimation

Step 7.2, each unmanned platform u executes the action

Obtaining a prize value

And entering the next time step.

9. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 8 comprises:

step 8.2, each unmanned platform u collects track data