CN113805568A - Man-machine cooperative perception method based on multi-agent space-time modeling and decision making - Google Patents

Man-machine cooperative perception method based on multi-agent space-time modeling and decision making Download PDF

Info

Publication number
CN113805568A
CN113805568A CN202110943514.0A CN202110943514A CN113805568A CN 113805568 A CN113805568 A CN 113805568A CN 202110943514 A CN202110943514 A CN 202110943514A CN 113805568 A CN113805568 A CN 113805568A
Authority
CN
China
Prior art keywords
unmanned platform
dimensional memory
agent
unmanned
following formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110943514.0A
Other languages
Chinese (zh)
Other versions
CN113805568B (en
Inventor
刘驰
王宇
朴成哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110943514.0A priority Critical patent/CN113805568B/en
Publication of CN113805568A publication Critical patent/CN113805568A/en
Application granted granted Critical
Publication of CN113805568B publication Critical patent/CN113805568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision making, which comprises the following steps: step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO; step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network; step 3, extracting global historical information from respective three-dimensional memory storage mapping; step 4, extracting context information which is critical to the current unmanned platform state; step 5, carrying out local updating on the three-dimensional memory storage mapping; step 6, finishing three-dimensional memory mapping Cubic Map by each unmanned platform; step 7, generating value estimation by using a value function, and executing generated actions by each unmanned platform to obtain reward values; step 8, repeatedly executing the steps 2-7; and 9, repeatedly executing the steps 1 to 8. The method has better sensing data acquisition effect, and can be widely applied to scenes with large area, complex environment and difficult communication.

Description

Man-machine cooperative perception method based on multi-agent space-time modeling and decision making
Technical Field
The invention relates to the technical field of mobile group perception, in particular to a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.
Background
The mobile group perception technology is a leading-edge research direction combining the Internet of things and artificial intelligence, a large number of mobile devices used by common users are used as basic perception units, the Internet of things and the mobile Internet are cooperated to realize perception task distribution and perception data collection and utilization, and large-scale and complex urban and social perception tasks are finally completed. However, mobile group perception systems based on mobile devices are often affected by various factors, such as uncertainty of user movement and quality problems of the mobile devices, and these factors may cause low quality of collected data and poor user satisfaction.
In addition to the mobile group perception technology taking people as the core, the rapid development of unmanned platform technologies such as unmanned aerial vehicles and unmanned vehicles is benefited, and the collection and propagation of perception data in urban environments by using the unmanned platforms are becoming practical.
Considering the coexistence of people and mobile unmanned platforms (such as unmanned express delivery vehicles for delivering express) in cities nowadays, the man-machine cooperative group perception can make up the quality problem caused by the group perception based on the people and the cost problem caused by the group perception based on the unmanned platforms, and the unmanned platforms are deployed to collect data from low-cost sensors distributed in buildings by fully utilizing the people in the cities, so that the data acquisition requirement under the smart cities can be better met.
However, in a real-world scenario, the technical challenges mainly faced by the man-machine collaborative mobile group perception technology are as follows:
technical challenge 1: the existing multi-agent learning technology based on centralized training cannot be used in a real environment, the existing multi-agent learning uses a centralized training mode to enable each agent to train the agent by utilizing respective local observation and global information obtained through information sharing, but because the area involved in information sharing is usually large and a plurality of obstacles for blocking communication signals exist, the centralized training cannot be carried out in the real environment.
Technical challenge 2: modeling of complex spatiotemporal information about moving people and dense obstacles in cities is difficult. In order to maximize the utilization of people in a city, it is necessary to consider that most people are uncontrollable, so the intelligent agent is required to plan its own data acquisition strategy according to the change of the space distribution of people with time.
Based on the problems in the prior art, the invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a human-computer cooperative perception method based on multi-agent space-time modeling and decision-making.
The invention adopts the following technical scheme:
a man-machine cooperative perception method based on multi-agent space-time modeling and decision making comprises the following steps:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample library of each unmanned platform, randomly initializing a data acquisition strategy of each unmanned platform, and starting a data acquisition task in a fully distributed mode in cooperation with people;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, and global history information is extracted from each three-dimensional memory Map by using global convolution reading operation;
step 4, each unmanned platform uses reading operation based on context cross-correlation based on the spatial features in respective local observation and the global history information extracted from respective three-dimensional memory storage mapping, and weights the information in the three-dimensional memory storage mapping according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space features in the current local observation, the global history information and the context information extracted from the respective three-dimensional memory mapping, and each unmanned platform finishes the three-dimensional memory mapping Cubic Map;
step 7, each unmanned platform uses a strategy function to generate actions and a value function to generate value estimation based on the characteristic vectors, and each unmanned platform executes the generated actions to obtain reward values;
step 8, repeatedly executing the steps 2-7 until the data acquisition task is finished, and optimizing a strategy function and a value function based on respective track data by each unmanned platform;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
Further, step 1 comprises:
step 1.1, unmanned platform clustering
Figure BDA0003216013420000031
Empty sample library of each unmanned platform u
Figure BDA0003216013420000032
Random initialization parameter thetau
And step 1.2, initializing a time step t to be 0, and starting to interact with the human-computer cooperative crowd sensing environment.
Further, step 2 comprises:
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state stEach unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space;
step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract eachSpatial features from local observations
Figure BDA0003216013420000033
Further, in step 3, the global historical spatiotemporal information is stored in a three-dimensional memory map
Figure BDA0003216013420000034
In the middle, global-based convolution read operations are used, all stored data are considered as a whole, and a convolution neural network is used to extract global information, as shown in the following formula (1):
Figure BDA0003216013420000035
in formula (1): phi is aread(. cndot.) represents a convolutional neural network.
Further, in step 2.1, global state stFor a three-dimensional vector, the first two dimensions are associated with two-dimensional coordinates, fsFor mapping the continuous coordinate values to discrete coordinate values,
Figure BDA0003216013420000036
successive coordinate values for the unmanned platform u at the current time step t, then
Figure BDA0003216013420000037
Where j controls the range of local observation.
Further, step 4 comprises:
step 4.1, using learnable parameter matrix
Figure BDA0003216013420000038
From current local spatial features
Figure BDA0003216013420000039
And global features
Figure BDA00032160134200000310
In which one is extracted by a convolution operationA query vector
Figure BDA00032160134200000311
The following formula (2):
Figure BDA00032160134200000312
in formula (2): denotes matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vector
Figure BDA00032160134200000313
And three-dimensional memory storage mapping
Figure BDA00032160134200000314
The cross correlation coefficient matrix between them, as shown in the following formula (3):
Figure BDA00032160134200000315
in formula (3): sigma denotes a sigmod activation function,
Figure BDA00032160134200000316
representing the calculation of cross-correlation coefficients;
step 4.3, use the cross correlation coefficient matrix
Figure BDA00032160134200000317
Mapping for three-dimensional memory storage
Figure BDA00032160134200000318
Weighting and generating a context vector by convolving the weighted results
Figure BDA00032160134200000319
The following formula (4):
Figure BDA00032160134200000320
in formula (4): f. ofc(. two-dimensional vector by third dimension replication of data
Figure BDA00032160134200000321
Expansion into a three-dimensional vector fc(. cndot.) has the following formula (5):
Figure BDA0003216013420000041
in formula (5): an element-wise multiplication is indicated by an element.
Further, step 5 comprises:
step 5.1, store the mapping from three-dimensional memory
Figure BDA0003216013420000042
The cubic area to be updated is selected according to the current position of the unmanned platform
Figure BDA0003216013420000043
(X '× Y') determines the spatial granularity of the written eigenvectors;
step 5.2, using the learnable parameter matrix
Figure BDA0003216013420000044
From the input
Figure BDA0003216013420000045
And
Figure BDA0003216013420000046
in generating a reset gate vector by a convolution operation
Figure BDA0003216013420000047
The following formula (6):
Figure BDA0003216013420000048
step 5.3, using the learnable parameter matrix
Figure BDA0003216013420000049
From the input
Figure BDA00032160134200000410
And
Figure BDA00032160134200000411
generating an updated gate vector by a convolution operation
Figure BDA00032160134200000412
The following formula (7):
Figure BDA00032160134200000413
step 5.4, using the learnable parameter matrix
Figure BDA00032160134200000414
And
Figure BDA00032160134200000415
from the input
Figure BDA00032160134200000416
And
Figure BDA00032160134200000417
by convolution operation, using reset gates
Figure BDA00032160134200000418
Generating candidate vectors
Figure BDA00032160134200000419
The following formula (8):
Figure BDA00032160134200000420
step 5.5, updating the door
Figure BDA00032160134200000421
Integration
Figure BDA00032160134200000422
And candidate vectors
Figure BDA00032160134200000423
To give the following formula (9)
Figure BDA00032160134200000424
Figure BDA00032160134200000425
Step 5.6, use
Figure BDA00032160134200000426
To replace
Figure BDA00032160134200000427
In (1)
Figure BDA00032160134200000428
Generating a three-dimensional memory storage map for a next time step
Figure BDA00032160134200000429
Further, in step 6, the current spatial feature information is processed
Figure BDA00032160134200000430
Storing profiles
Figure BDA00032160134200000431
And contextual information
Figure BDA00032160134200000432
Performing a join operation to generate feature vectors using convolution on the join results
Figure BDA00032160134200000433
The following formula (10):
Figure BDA00032160134200000434
in formula (10): phi is aoutput(. h) denotes a convolution operation, [;]representing the concatenation of vectors.
Further, step 7 comprises:
step 7.1, the unmanned platform u uses the feature vector
Figure BDA00032160134200000435
Separately inputting policy functions
Figure BDA00032160134200000436
And a cost function
Figure BDA00032160134200000437
Generating actions
Figure BDA00032160134200000438
And value estimation
Figure BDA00032160134200000439
Step 7.2, each unmanned platform u executes the action
Figure BDA00032160134200000440
Obtaining a prize value
Figure BDA00032160134200000441
And entering the next time step.
Further, step 8 comprises:
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track data
Figure BDA00032160134200000442
And according to
Figure BDA00032160134200000443
Calculating a cumulative reward estimate
Figure BDA00032160134200000444
And advantage estimation
Figure BDA00032160134200000445
Calculating a cumulative reward estimate for a time step i
Figure BDA00032160134200000446
The following formula (11):
Figure BDA0003216013420000051
in formula (11): gamma epsilon [0, 1 is a discount factor, and the dominance estimation is calculated by using the GAE mode
Figure BDA0003216013420000052
The following formula (12):
Figure BDA0003216013420000053
in formula (12): lambda belongs to [0, 1]]Calculating a time differential offset for the discounting factor
Figure BDA0003216013420000054
The following formula (13):
Figure BDA0003216013420000055
step 8.3, each unmanned platform u is to
Figure BDA0003216013420000056
Slicing according to length K in time dimension, and adding the generated sequence sample into a sample library
Figure BDA0003216013420000057
Step 8.4, each unmanned platform u learns from the sample library in a batch mode
Figure BDA0003216013420000058
Collecting M sequence samples based on joint loss function in PPO
Figure BDA0003216013420000059
For parameter thetauUpdate is performed, and then the next round is entered, wherein
Figure BDA00032160134200000510
Is a loss function of the policy function and,
Figure BDA00032160134200000511
is a loss function of the cost function,
Figure BDA00032160134200000512
is a regularization term that is related to the policy function,
Figure BDA00032160134200000513
the calculation formulas of (a) are as follows (14) to (16):
Figure BDA00032160134200000514
Figure BDA00032160134200000515
Figure BDA00032160134200000516
in formula (14): s is the policy entropy, c1,c2,∈1,∈2Are all constant.
Compared with the prior art, the invention has the following advantages:
1. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making is completely distributed in the training and testing stages, does not depend on any communication, can be easily applied to wide and complex scenes in space, solves the technical challenge that the existing multi-agent learning technology based on centralized training cannot be used in the actual scene, adopts FD-MAPPO as the training frame of the unmanned platform cluster, has better perception data acquisition effect compared with the existing multi-agent learning technology, and can be widely applied to scenes which are large in area, complex in environment and difficult to communicate;
2. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making adopts a three-dimensional memory storage mapping Cubic Map, uses an original storage structure, is matched with local writing operation according to positions to store long-term space-time sequence data, keeps the integrity of global overall space information while recording the internal detail information of local spaces, lays a foundation for better extracting the features in the long-term space-time sequence data, takes the designed global and context-based reading operation and output operation as an extraction method, can ensure the comprehensiveness and accuracy of feature extraction, simultaneously provides required local detail information, and solves the technical challenge of difficult modeling of complex space-time information about mobile crowds, dense obstacles and the like in cities.
Drawings
FIG. 1 is a schematic diagram illustrating a human-machine cooperative perception method based on multi-agent spatiotemporal modeling and decision-making in an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an influence of the number U of unmanned platforms on data acquisition efficiency (λ) in the sensing method according to the embodiment of the present invention;
FIG. 3 is a diagram illustrating a data acquisition rate of the unmanned platform U pair according to the sensing method in the embodiment of the present invention
Figure BDA0003216013420000061
Schematic diagram of the effects of (1);
fig. 4 is a schematic diagram illustrating an influence of the number U of unmanned platforms on geographic fairness (ξ) in the sensing method according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the synergistic factor (ζ) in the sensing method according to the embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the energy consumption rate (β) in the sensing method according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating the utilization rate of the unmanned platform number U to the crowd according to the sensing method in the embodiment of the present invention
Figure BDA0003216013420000062
Schematic diagram of the effects of (1);
FIG. 8 is a diagram illustrating the crowd participation ratio ω to crowd utilization ratio of the sensing method according to an embodiment of the present invention
Figure BDA0003216013420000063
Schematic diagram of the effect of (c).
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments, it being understood that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.
Examples
As shown in fig. 1, the human-computer cooperative perception method based on multi-agent spatiotemporal modeling and decision includes:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample base of each unmanned platform, randomly initializing a data acquisition strategy (namely initializing a neural network parameter for decision), and starting a data acquisition task in a fully distributed mode in cooperation with a crowd;
step 1.1, unmanned platform clustering
Figure BDA0003216013420000064
Empty sample library of each unmanned platform u
Figure BDA0003216013420000065
Random initialization parameter thetau
Step 1.2, initializing a time step t to be 0, and starting to interact with a human-computer cooperative crowd sensing environment;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state st, and each unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space
Figure BDA0003216013420000071
Step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract the spatial features in each local observation
Figure BDA0003216013420000072
Step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, global historical information is extracted from each three-dimensional memory Map by using a global convolution reading operation, wherein the global historical space-time information is stored in the three-dimensional memory Map
Figure BDA0003216013420000073
In the middle, global-based convolutional read operations are used, all stored data are treated as a whole, and a convolutional neural network is used to extract global information:
Figure BDA0003216013420000074
wherein phiread(. -) represents a convolutional neural network;
step 4, each unmanned platform extracts global history information from the respective three-dimensional memory storage mapping based on the spatial features in the respective local observation, and extracts the context information which is critical to the state of the current unmanned platform from the respective three-dimensional memory storage mapping by using the reading operation based on context cross-correlation, namely, the information in the three-dimensional memory storage mapping is weighted according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;
step 4.1, using learnable parameter matrix
Figure BDA0003216013420000075
From current local spatial features
Figure BDA0003216013420000076
And global features
Figure BDA0003216013420000077
Wherein a query vector is extracted by a convolution operation:
Figure BDA0003216013420000078
wherein denotes a matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vector
Figure BDA0003216013420000079
And three-dimensional memory storage mapping
Figure BDA00032160134200000710
Cross correlation coefficient matrix between:
Figure BDA00032160134200000711
where sigma denotes a sigmod activation function,
Figure BDA00032160134200000712
representing the calculation of cross-correlation coefficients;
step 4.3, use the cross correlation coefficient matrix
Figure BDA00032160134200000713
Mapping for three-dimensional memory storage
Figure BDA00032160134200000714
Weighting and generating a context vector by convolving the weighted results:
Figure BDA00032160134200000715
wherein f isc(. to) two-dimensional vector by copying data in a third dimension
Figure BDA00032160134200000716
Expansion into a three-dimensional vector:
Figure BDA00032160134200000717
an by element multiplication;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
step 5.1, store the mapping from three-dimensional memory
Figure BDA0003216013420000081
The cubic area to be updated is selected according to the current position of the unmanned platform
Figure BDA0003216013420000082
(X '× Y') determines the spatial granularity of the written eigenvectors;
step 5.2, using the learnable parameter matrix
Figure BDA0003216013420000083
From the input
Figure BDA0003216013420000084
And
Figure BDA0003216013420000085
generating a reset gate vector by a convolution operation:
Figure BDA0003216013420000086
step 5.3, using the learnable parameter matrix
Figure BDA0003216013420000087
From the input
Figure BDA0003216013420000088
And
Figure BDA0003216013420000089
generating an update gate vector by a convolution operation:
Figure BDA00032160134200000810
step 5.4, using the learnable parameter matrix
Figure BDA00032160134200000811
And
Figure BDA00032160134200000812
from the input
Figure BDA00032160134200000813
And
Figure BDA00032160134200000814
by convolution operation, using reset gates
Figure BDA00032160134200000815
Generating a candidate vector:
Figure BDA00032160134200000816
step 5.5, updating the door
Figure BDA00032160134200000817
Integration
Figure BDA00032160134200000818
And candidate vectors
Figure BDA00032160134200000819
Generating:
Figure BDA00032160134200000820
step 5.6, use
Figure BDA00032160134200000821
To replace
Figure BDA00032160134200000822
In (1)
Figure BDA00032160134200000823
Generating a three-dimensional memory storage map for a next time step
Figure BDA00032160134200000824
Step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space feature in the current local observation, the global historical information and the context information extracted from the respective three-dimensional memory storage mapping, each unmanned platform finishes the three-dimensional memory storage mapping Cubic Map, and the current space feature information is subjected to the Cubic Map
Figure BDA00032160134200000825
Storing profiles
Figure BDA00032160134200000826
And contextual information
Figure BDA00032160134200000827
Feature vectors are generated using convolution:
Figure BDA00032160134200000828
wherein phioutput(. -) represents a convolution operation;
step 7, each unmanned platform uses a strategy function to generate actions and a value function to generate value estimation based on the characteristic vectors, and each unmanned platform executes the generated actions to obtain reward values;
step 7.1, the unmanned platform u uses the feature vector
Figure BDA00032160134200000829
Separately inputting policy functions
Figure BDA00032160134200000830
And a cost function
Figure BDA00032160134200000831
Generating actions
Figure BDA00032160134200000832
And value estimation
Figure BDA00032160134200000833
Step 7.2, each unmanned platform u executes the action
Figure BDA00032160134200000834
Obtaining a prize value
Figure BDA00032160134200000835
Entering the next time step;
step 8, repeatedly executing the steps 2-7 until the data acquisition task is finished, and optimizing a strategy function and a value function based on respective track data by each unmanned platform;
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track data
Figure BDA00032160134200000836
And according to
Figure BDA00032160134200000837
Calculating a cumulative reward estimate
Figure BDA00032160134200000838
And advantage estimation
Figure BDA00032160134200000839
Calculate the cumulative reward estimate for a certain time step i:
Figure BDA00032160134200000840
where γ ∈ [0, 1 is the discount factor, the dominance estimate is calculated using the GAE approach:
Figure BDA0003216013420000091
wherein λ ∈ [0, 1] is a discount factor, and the time difference deviation is as follows:
Figure BDA0003216013420000092
step 8.3, each unmanned platform u is to
Figure BDA0003216013420000093
Slicing according to K degrees in the time dimension, and adding the generated sequence samples into a sample library
Figure BDA0003216013420000094
Step 8.4, each unmanned platform u learns from the sample library in a batch mode
Figure BDA0003216013420000095
Collecting M sequence samples based on the union in PPOSum loss function
Figure BDA0003216013420000096
For parameter thetauUpdate is performed, and then the next round is entered, wherein
Figure BDA0003216013420000097
Is a loss function of the policy function and,
Figure BDA0003216013420000098
is a loss function of the cost function,
Figure BDA0003216013420000099
is a regularization term that is related to the policy function,
Figure BDA00032160134200000910
the calculation formula of (a) is as follows:
Figure BDA00032160134200000911
Figure BDA00032160134200000912
Figure BDA00032160134200000913
where S is the policy entropy, c1,c2,∈1,∈2Are all constant;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
In step 1 of the above embodiment, the method is used when the human-computer is cooperated with the data acquisition task
Figure BDA00032160134200000914
Represent an unmanned platform cluster, such thatBy using
Figure BDA00032160134200000915
Representing the population, in a round time range [0, T]In-person, unmanned platform cluster and crowd share low-cost sensor
Figure BDA00032160134200000916
In the collection of data, the unmanned platform cluster is usually restricted to flying below a certain altitude, for example in the united states, according to LAANC (low altitude authorization and notification capability), unmanned platforms can fly up to 120 meters in controlled airspace, all buildings are considered as obstacles that unmanned platforms cannot fly over due to different regulations on unmanned platform flying altitude in different regions, and furthermore, unmanned platform charging stations are deployed in cities such as parking lots
Figure BDA00032160134200000917
So as to lead the unmanned platform to go to supplement energy without loss of generality, a time slot system is adopted, namely the whole perception task is divided into equal T discrete time steps, all the unmanned platforms and the crowd move continuously in a two-dimensional environment, and each unmanned platform u can move a distance in any direction within the time step [ T, T + 1]
Figure BDA00032160134200000918
Wherein deltamaxIs the maximum moving distance calculated by the unmanned platform according to the maximum moving speed in a time step,
Figure BDA00032160134200000919
is the position of the unmanned platform u at the initial moment of the time step [ t, t +1), and each sensor p is provided with
Figure BDA00032160134200000920
The data volume is used for collecting unmanned platform clusters and crowds, and in each time step [ t, t + 1], if a certain sensor p is in the data perception range of a certain unmanned platform u and a certain person l, no sensor p is usedThe human platform u will collect
Figure BDA0003216013420000101
Individual data volume, person l will collect
Figure BDA0003216013420000102
An amount of data wherein
Figure BDA0003216013420000103
And
Figure BDA0003216013420000104
are constants that represent the amount of data that an unmanned platform and a person, respectively, can collect from a single sensor at most at a single time step,
Figure BDA0003216013420000105
the data quantity of the sensor p at the initial moment of the time step [ t, t +1) and the total data quantity collected by the unmanned platform u and the person l in the time step [ t, t +1) can be respectively represented as
Figure BDA0003216013420000106
Figure BDA0003216013420000107
And
Figure BDA0003216013420000108
wherein
Figure BDA0003216013420000109
And
Figure BDA00032160134200001010
respectively representing the sensors within the data perception ranges of the unmanned platform u and the person l in the time step [ t, t +1), each unmanned platform u is provided with a sensor in the beginning of the data acquisition task
Figure BDA00032160134200001011
An initial energy. At each time step [ t, t +1), the unmanned platform u will be consumed by the move
Figure BDA00032160134200001012
An energy, where η η is a mobile energy consumption factor, at the beginning of each time step [ t, t + 1], if the unmanned platform u is within the charging range of a certain charging station, and
Figure BDA00032160134200001013
then the unmanned platform u will be charged to
Figure BDA00032160134200001014
Energy, when a man-machine cooperative data acquisition task starts, each unmanned platform u empties a sample library
Figure BDA00032160134200001015
Random initialization parameter thetauAnd setting the current time step t as 0, and enabling the unmanned platform cluster and the crowd to start interacting with the environment.
The simulation experiment of the above embodiment used a set of crowd movement trajectory data from the NCSU of university of america obtained from CRAWDAD, where there were 35 crowd movement trajectories in the NCSU, generated by 32 college students, using a GPS receiver to record their movement trajectories in their daily lives, where the GPS receiver would record the positions of selected students every 30 seconds during several hours to generate one trajectory, using google maps for marking map data, including the positions and shapes of buildings, lakes, and mountains, the north-south span of the NCSU being 1790.18 meters, the east-west being about 2028.70 meters, the floor area being about 363 ten thousand square meters, and similarly, 104 sensors were placed on 99 buildings
Figure BDA00032160134200001016
Randomly generating the data from 1GB to 1.5GB, setting the initial position of each unmanned platform as the central point of a scene, setting the maximum flight speed to be 12km/h, the initial energy to be e _0 to 20kJ, the mobile energy consumption factor eta to be 0.01kJ/m, and setting the data of the human platform and the unmanned platformThe sensing radius is 50 meters and 60 meters respectively, the rate of data acquisition from a single sensor is 8.3Mbps and 166.7Mbps respectively, and the charging radius of the charging station is 20 meters in consideration of the length of the cable.
In step 2 of the above embodiment, with respect to the observation of the global scope, the observation in the area with the unmanned platform as the center and a distance as the radius is called local observation;
in step 7 of the above embodiment, the actions generated by using the policy function are, for example: distances of movement along two coordinate axes in a two-dimensional coordinate system respectively; each unmanned platform performs the generated action to obtain a reward value, such as: when the obstacle is hit, a negative reward is given, and when the data is collected, a positive reward is given.
In order to further show the performance of the embodiment in the aspect of the human-computer cooperative mobile group perception task, a complete and thorough system test is carried out, and the specific evaluation form is 6 indexes of the system when a round is finished:
1. data acquisition rate
Figure BDA00032160134200001017
And the total data volume collected by all the unmanned platforms accounts for the total proportion of the initial data volume of the sensor.
2. Geographic fairness (ξ): and calculating the geographic fairness of the data acquired by all the unmanned platforms by adopting a Jain fairness index.
3. Cofactor (ζ): degree of synergy between unmanned platform clusters and populations.
4. Energy consumption rate (β): and the consumed energy accounts for the proportion of the sum of the initial energy and the supplementary energy of all the unmanned platforms in the movement of all the unmanned platforms.
5. Crowd utilization rate
Figure BDA0003216013420000111
The proportion of the data volume collected by the crowd under the ideal condition (without unmanned platform) is occupied by the actual data volume collected by the crowd, wherein
Figure BDA0003216013420000112
Representing a subset of the population.
6. Data acquisition efficiency (λ): the aim of the invention is to maximize the data acquisition rate
Figure BDA0003216013420000113
The geographic fairness (xi) and the synergistic factor (zeta) are simultaneously minimized, and the energy consumption rate (beta) is synthesized into an index:
Figure BDA0003216013420000114
in addition, the following 6 reference techniques were used for comparison:
1. FD-MAPPO (neural map): neural Map is an existing and advanced memory storage structure, and maintains a two-dimensional three-dimensional memory storage Map by using operations such as reading, writing, updating and outputting, and the Neural Map is matched by using the FD-MAPPO training framework provided by the invention in order to compare the Neural Map with the Cubic Map.
2. RPG: this is an existing, advanced multi-agent deep reinforcement learning technique that uses a reward random exploration technique to achieve better performance based on a policy gradient algorithm.
3. IPPO: the method is a multi-agent deep reinforcement learning technology based on a PPO algorithm, wherein each agent shares parameters.
4. PPO: the technology is an advanced single-agent deep reinforcement learning technology.
5. e-Divert: this is a multi-agent deep reinforcement learning technique based on the MADDPG algorithm, which uses a distributed prior experience pool and LSTM to obtain better performance, which is the most advanced multi-agent deep reinforcement learning crowd-sourcing sensing technique.
6. Random: each unmanned platform moves by adopting a random strategy.
The number U of unmanned platforms in the scene and the crowd participation ratio omega are respectively used as independent variables, and the dependent variables are the evaluation indexes, namely the data acquisition efficiency (lambda) and the data acquisition rate
Figure BDA0003216013420000115
Geographic fairness (ξ), co-factor (ζ), energy consumption rate (β) and crowd utilization rate
Figure BDA0003216013420000116
Two sets of tests were performed:
as shown in fig. 2, the learning framework consistently outperforms all other baseline techniques in terms of efficiency for the following reasons: PPO uses only one agent to control the behavior of multiple unmanned platforms, and may miss feedback from a particular unmanned platform, so that cooperation inside the unmanned platform cannot be sufficiently achieved; although e-river uses multiple agents, its deterministic strategy, DDPG, adopted is not good at action exploration, but is just crucial in the environment (e.g., in NCSU environments, the north and south campuses communicate through only two thin tunnels); IPPO performs far better than PPO and e-river because it uses multiple agents and a random strategy, however, parameter sharing across agents limits the potential of each unmanned platform to accurately capture observed features; RPG and FD-mappo (neural map) perform better than other baseline techniques, but still perform worse than FD-mappo (cubic map) because RPG explores using rewarding stochastic exploration techniques, but neglects spatio-temporal modeling of long trajectory sequences, in which respect the unmanned platform is easily disoriented and trapped in the obstacle; FD-mappo (neural map) tends to model spatio-temporal correlation by a two-dimensional map, but because the technique flattens the 3D tensor into a 1D tensor in the writing operation, the technique loses almost all spatial information in the modeling process;
as shown in FIGS. 2-5, the data acquisition efficiency (λ), data acquisition rate, of all methods when increasing the number of unmanned platforms
Figure BDA0003216013420000121
Both geographic fairness (ξ) and co-factor (ζ) will rise first and then gradually saturate as shown in fig. 6, with the use of more unmanned platforms, except e-Din addition to the vett technology, the energy consumption rate (β) of all the methods has an ascending trend and then tends to be saturated, because more unmanned platforms are used, so that corners and remote areas can be explored at an opportunity to collect richer data, but too many unmanned platforms do not bring additional benefits, so the energy consumption rate (β) tends to be saturated gradually, however, in contrast, the energy consumption rate (β) corresponding to the e-river technology is a descending trend and then tends to be saturated at first, and by checking the track of the unmanned platforms, the area explored by the newly added unmanned platforms is a corner with extremely few data due to the poor obstacle avoidance and exploration capacity of the e-river, so that the unmanned platforms are blocked by obstacles and stop moving at an early stage, and thus less energy is consumed;
as shown in fig. 7, the crowd utilization ratio
Figure BDA0003216013420000122
With the use of unmanned platforms increasing and with less unmanned platforms, the FD-mappo (cubic map) technique maintains relatively high crowd utilization by navigating unmanned platforms to those regions with few people
Figure BDA0003216013420000123
The highest efficiencies are achieved compared to FD-mappo (neural map) and RPG technologies, and as more unmanned platforms are deployed, they are also distributed to areas where sensors are densely deployed, and it is difficult to collect all data, although people may also be present in these areas, in order to collect as much data as possible. Crowd utilization if an unmanned platform intentionally bypasses the crowd to achieve a good 'cooperation' level
Figure BDA0003216013420000124
This increases, but moving long distances results in higher energy consumption rates and ultimately still results in reduced efficiency.
As shown in FIG. 8, the crowd utilization ratio can be observed
Figure BDA0003216013420000125
This is always going down because 25% of the people initially enabled are already enough, so adding more people does not bring additional benefit to the data collection task that the crowd does in cooperation with the unmanned platform, Random, e-Divert, PPO and IPPO technologies do not perform well but crowd utilization is poor
Figure BDA0003216013420000126
Are relatively high because they may explore areas where a crowd is present and where the crowd has collected almost all of the data, the present invention maintains relatively high crowd utilization compared to RPG and FD-mappo (neural map) technologies
Figure BDA0003216013420000127
This is because FD-MAPPO (Cubic map) reduces the probability of redundant data collection by human-machine, and when the number of unmanned platforms is small (U ≦ 4), the unmanned platforms will be navigated to areas where the crowd cannot collect all data alone.
The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (9)

1. A man-machine cooperative perception method based on multi-agent space-time modeling and decision making is characterized by comprising the following steps:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample library of each unmanned platform, randomly initializing a data acquisition strategy of each unmanned platform, and starting a data acquisition task in a fully distributed mode in cooperation with people;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, and global history information is extracted from each three-dimensional memory Map by using global convolution reading operation;
step 4, each unmanned platform uses reading operation based on context cross-correlation based on the spatial features in respective local observation and the global history information extracted from respective three-dimensional memory storage mapping, and weights the information in the three-dimensional memory storage mapping according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space features in the current local observation, the global history information and the context information extracted from the respective three-dimensional memory mapping, and each unmanned platform finishes the three-dimensional memory mapping Cubic Map;
step 7, each unmanned platform uses a strategy function to generate actions and a value function to generate value estimation based on the characteristic vectors, and each unmanned platform executes the generated actions to obtain reward values;
step 8, repeatedly executing the steps 2-7 until the data acquisition task is finished, and optimizing a strategy function and a value function based on respective track data by each unmanned platform;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
2. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein step 1 comprises:
step 1.1, unmanned platform clustering
Figure FDA0003216013410000011
Empty sample library of each unmanned platform u
Figure FDA0003216013410000012
Random initialization parameter thetau
And step 1.2, initializing a time step t to be 0, and starting to interact with the human-computer cooperative crowd sensing environment.
3. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 2 comprises:
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state stEach unmanned platform u obtains a corresponding local observation according to its position in global space
Figure FDA0003216013410000013
Step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract the spatial features in each local observation
Figure FDA0003216013410000014
4. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 3, global historical spatiotemporal information is stored in a three-dimensional memory storage map
Figure FDA0003216013410000021
In the middle, global-based convolution read operations are used, all stored data are considered as a whole, and a convolution neural network is used to extract global information, as shown in the following formula (1):
Figure FDA0003216013410000022
in formula (1): phi is aread(. cndot.) represents a convolutional neural network.
5. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 4 comprises:
step 4.1, using learnable parameter matrix
Figure FDA0003216013410000023
From current local spatial features
Figure FDA0003216013410000024
And global features
Figure FDA0003216013410000025
Wherein a query vector is extracted by a convolution operation
Figure FDA0003216013410000026
The following formula (2):
Figure FDA0003216013410000027
in formula (2): denotes matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vector
Figure FDA0003216013410000028
And three-dimensional memory storage mapping
Figure FDA0003216013410000029
The cross correlation coefficient matrix between them, as shown in the following formula (3):
Figure FDA00032160134100000210
in formula (3): sigma denotes a sigmod activation function,
Figure FDA00032160134100000211
representing the calculation of cross-correlation coefficients;
step (ii) of4.3 Using the Cross correlation coefficient matrix
Figure FDA00032160134100000212
Mapping for three-dimensional memory storage
Figure FDA00032160134100000213
Weighting and generating a context vector by convolving the weighted results
Figure FDA00032160134100000214
The following formula (4):
Figure FDA00032160134100000215
in formula (4): f. ofc(. two-dimensional vector by third dimension replication of data
Figure FDA00032160134100000216
Expansion into a three-dimensional vector fc(. cndot.) has the following formula (5):
Figure FDA00032160134100000217
in formula (5): an element-wise multiplication is indicated by an element.
6. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 5 comprises:
step 5.1, store the mapping from three-dimensional memory
Figure FDA00032160134100000218
The cubic area to be updated is selected according to the current position of the unmanned platform
Figure FDA00032160134100000219
Determines the write eigenvectorSpatial granularity of (a);
step 5.2, using the learnable parameter matrix
Figure FDA00032160134100000220
From the input
Figure FDA00032160134100000221
And
Figure FDA00032160134100000222
in generating a reset gate vector by a convolution operation
Figure FDA00032160134100000223
The following formula (6):
Figure FDA00032160134100000224
step 5.3, using the learnable parameter matrix
Figure FDA00032160134100000225
From the input
Figure FDA00032160134100000226
And
Figure FDA00032160134100000227
generating an updated gate vector by a convolution operation
Figure FDA00032160134100000228
The following formula (7):
Figure FDA00032160134100000229
step 5.4, using the learnable parameter matrix
Figure FDA0003216013410000031
And
Figure FDA0003216013410000032
from the input
Figure FDA0003216013410000033
And
Figure FDA0003216013410000034
by convolution operation, using reset gates
Figure FDA0003216013410000035
Generating candidate vectors
Figure FDA0003216013410000036
The following formula (8):
Figure FDA0003216013410000037
step 5.5, updating the door
Figure FDA0003216013410000038
Integration
Figure FDA0003216013410000039
And candidate vectors
Figure FDA00032160134100000310
To give the following formula (9)
Figure FDA00032160134100000311
Figure FDA00032160134100000312
Step 5.6, use
Figure FDA00032160134100000313
To replace
Figure FDA00032160134100000314
In (1)
Figure FDA00032160134100000315
Generating a three-dimensional memory storage map for a next time step
Figure FDA00032160134100000316
7. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 6, the current spatial feature information is processed
Figure FDA00032160134100000317
Storing profiles
Figure FDA00032160134100000318
And contextual information
Figure FDA00032160134100000319
Feature vector generation using convolution
Figure FDA00032160134100000320
The following formula (10):
Figure FDA00032160134100000321
in formula (10): phi is aoutput(. h) denotes a convolution operation, [;]representing the concatenation of vectors.
8. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 7 comprises:
step 7.1, the unmanned platform u uses the feature vector
Figure FDA00032160134100000322
Separately inputting policy functions
Figure FDA00032160134100000323
And a cost function
Figure FDA00032160134100000324
Generating actions
Figure FDA00032160134100000325
And value estimation
Figure FDA00032160134100000326
Step 7.2, each unmanned platform u executes the action
Figure FDA00032160134100000327
Obtaining a prize value
Figure FDA00032160134100000328
And entering the next time step.
9. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 8 comprises:
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track data
Figure FDA00032160134100000329
And according to
Figure FDA00032160134100000330
Calculating a cumulative reward estimate
Figure FDA00032160134100000331
And advantage estimation
Figure FDA00032160134100000332
Calculating a cumulative reward estimate for a time step i
Figure FDA00032160134100000333
The following formula (11):
Figure FDA00032160134100000334
in formula (11): gamma epsilon [0, 1 is a discount factor, and the dominance estimation is calculated by using the GAE mode
Figure FDA00032160134100000335
The following formula (12):
Figure FDA00032160134100000336
in formula (12): lambda belongs to [0, 1]]Time differential bias as a discounting factor
Figure FDA00032160134100000337
The following formula (13):
Figure FDA00032160134100000338
step 8.3, each unmanned platform u is to
Figure FDA00032160134100000339
Slicing according to length K in time dimension, and adding the generated sequence sample into a sample library
Figure FDA00032160134100000340
Step 8.4, each unmanned platform u learns from the sample library in a batch mode
Figure FDA00032160134100000341
Collecting M sequence samples based on joint loss function in PPO
Figure FDA00032160134100000342
Update the parameter thetau, then proceed to the next round, where
Figure FDA00032160134100000343
Is a loss function of the policy function and,
Figure FDA00032160134100000344
is a loss function of the cost function,
Figure FDA00032160134100000345
is a regularization term that is related to the policy function,
Figure FDA00032160134100000346
the calculation formulas of (a) are as follows (14) to (16):
Figure FDA0003216013410000041
Figure FDA0003216013410000042
Figure FDA0003216013410000043
in formula (14): s is the policy entropy, c1,c2,∈1,∈2Are all constant.
CN202110943514.0A 2021-08-17 2021-08-17 Man-machine collaborative perception method based on multi-agent space-time modeling and decision Active CN113805568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110943514.0A CN113805568B (en) 2021-08-17 2021-08-17 Man-machine collaborative perception method based on multi-agent space-time modeling and decision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110943514.0A CN113805568B (en) 2021-08-17 2021-08-17 Man-machine collaborative perception method based on multi-agent space-time modeling and decision

Publications (2)

Publication Number Publication Date
CN113805568A true CN113805568A (en) 2021-12-17
CN113805568B CN113805568B (en) 2024-04-09

Family

ID=78893696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110943514.0A Active CN113805568B (en) 2021-08-17 2021-08-17 Man-machine collaborative perception method based on multi-agent space-time modeling and decision

Country Status (1)

Country Link
CN (1) CN113805568B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203689A (en) * 2016-07-04 2016-12-07 大连理工大学 A kind of Hydropower Stations cooperation Multiobjective Optimal Operation method
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN112651486A (en) * 2020-12-09 2021-04-13 中国人民解放军陆军工程大学 Method for improving convergence rate of MADDPG algorithm and application thereof
CN112880688A (en) * 2021-01-27 2021-06-01 广州大学 Unmanned aerial vehicle three-dimensional flight path planning method based on chaotic self-adaptive sparrow search algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203689A (en) * 2016-07-04 2016-12-07 大连理工大学 A kind of Hydropower Stations cooperation Multiobjective Optimal Operation method
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN112651486A (en) * 2020-12-09 2021-04-13 中国人民解放军陆军工程大学 Method for improving convergence rate of MADDPG algorithm and application thereof
CN112880688A (en) * 2021-01-27 2021-06-01 广州大学 Unmanned aerial vehicle three-dimensional flight path planning method based on chaotic self-adaptive sparrow search algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YOUQI LI.ETC: "MP-Coopetition: Competitive and Cooperative Mechanism for Multiple Platforms in Mobile Crowd Sensing", 《IEEE TRANSACTIONS ON SERVICES COMPUTING》, vol. 14, no. 6, pages 1935 - 1947 *
YU WANG ETC.: "Human-Drone Collaborative Spatial Crowdsourcing by Memory-Augmented and Distributed Multi-Agent Deep Reinforcement Learning", 《2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING》, pages 459 - 471 *

Also Published As

Publication number Publication date
CN113805568B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN113110592B (en) Unmanned aerial vehicle obstacle avoidance and path planning method
Rauch et al. Maximum likelihood estimates of linear dynamic systems
CN101413806B (en) Mobile robot grating map creating method of real-time data fusion
CN110118560B (en) Indoor positioning method based on LSTM and multi-sensor fusion
CN110414732B (en) Travel future trajectory prediction method and device, storage medium and electronic equipment
CN114625151A (en) Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN112069573A (en) City group space simulation method, system and equipment based on cellular automaton
CN111027627A (en) Vibration information terrain classification and identification method based on multilayer perceptron
Zhao et al. A deep reinforcement learning based searching method for source localization
CN111376273A (en) Brain-like inspired robot cognitive map construction method
CN111242352A (en) Parking aggregation effect prediction method based on vehicle track
Dai et al. Spatio-temporal deep learning framework for traffic speed forecasting in IoT
CN115561834A (en) Meteorological short-term and temporary forecasting all-in-one machine based on artificial intelligence
CN116052427A (en) Inter-city inter-regional mobility prediction method and device based on private car travel track data
CN103839280B (en) A kind of human body attitude tracking of view-based access control model information
Wei et al. Sensor-fusion for smartphone location tracking using hybrid multimodal deep neural networks
CN113159371B (en) Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN114004152A (en) Multi-wind-field wind speed space-time prediction method based on graph convolution and recurrent neural network
CN113805568B (en) Man-machine collaborative perception method based on multi-agent space-time modeling and decision
CN116167254A (en) Multidimensional city simulation deduction method and system based on city big data
Chancán et al. CityLearn: Diverse real-world environments for sample-efficient navigation policy learning
CN104504207A (en) Agent-based scenic spot visitor behavior simulation modeling method
Hugues Collective grounded representations for robots
CN118393900B (en) Automatic driving decision control method, device, system, equipment and storage medium
Wang et al. Path planning model of mobile robots in the context of crowds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant