CN113805568A - Man-machine cooperative perception method based on multi-agent space-time modeling and decision making - Google Patents
Man-machine cooperative perception method based on multi-agent space-time modeling and decision making Download PDFInfo
- Publication number
- CN113805568A CN113805568A CN202110943514.0A CN202110943514A CN113805568A CN 113805568 A CN113805568 A CN 113805568A CN 202110943514 A CN202110943514 A CN 202110943514A CN 113805568 A CN113805568 A CN 113805568A
- Authority
- CN
- China
- Prior art keywords
- unmanned platform
- dimensional memory
- agent
- unmanned
- following formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000008447 perception Effects 0.000 title claims abstract description 29
- 230000006870 function Effects 0.000 claims abstract description 34
- 238000013507 mapping Methods 0.000 claims abstract description 33
- 230000005055 memory storage Effects 0.000 claims abstract description 27
- 230000002787 reinforcement Effects 0.000 claims abstract description 12
- 230000009471 action Effects 0.000 claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 45
- 239000003795 chemical substances by application Substances 0.000 claims description 41
- 239000011159 matrix material Substances 0.000 claims description 21
- 230000008901 benefit Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000010076 replication Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000004891 communication Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 11
- 238000005265 energy consumption Methods 0.000 description 10
- 230000001537 neural effect Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 229920006395 saturated elastomer Polymers 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 101710097943 Viral-enhancing factor Proteins 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000008566 social perception Effects 0.000 description 1
- 238000012358 sourcing Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- Medical Informatics (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Radar, Positioning & Navigation (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Game Theory and Decision Science (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision making, which comprises the following steps: step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO; step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network; step 3, extracting global historical information from respective three-dimensional memory storage mapping; step 4, extracting context information which is critical to the current unmanned platform state; step 5, carrying out local updating on the three-dimensional memory storage mapping; step 6, finishing three-dimensional memory mapping Cubic Map by each unmanned platform; step 7, generating value estimation by using a value function, and executing generated actions by each unmanned platform to obtain reward values; step 8, repeatedly executing the steps 2-7; and 9, repeatedly executing the steps 1 to 8. The method has better sensing data acquisition effect, and can be widely applied to scenes with large area, complex environment and difficult communication.
Description
Technical Field
The invention relates to the technical field of mobile group perception, in particular to a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.
Background
The mobile group perception technology is a leading-edge research direction combining the Internet of things and artificial intelligence, a large number of mobile devices used by common users are used as basic perception units, the Internet of things and the mobile Internet are cooperated to realize perception task distribution and perception data collection and utilization, and large-scale and complex urban and social perception tasks are finally completed. However, mobile group perception systems based on mobile devices are often affected by various factors, such as uncertainty of user movement and quality problems of the mobile devices, and these factors may cause low quality of collected data and poor user satisfaction.
In addition to the mobile group perception technology taking people as the core, the rapid development of unmanned platform technologies such as unmanned aerial vehicles and unmanned vehicles is benefited, and the collection and propagation of perception data in urban environments by using the unmanned platforms are becoming practical.
Considering the coexistence of people and mobile unmanned platforms (such as unmanned express delivery vehicles for delivering express) in cities nowadays, the man-machine cooperative group perception can make up the quality problem caused by the group perception based on the people and the cost problem caused by the group perception based on the unmanned platforms, and the unmanned platforms are deployed to collect data from low-cost sensors distributed in buildings by fully utilizing the people in the cities, so that the data acquisition requirement under the smart cities can be better met.
However, in a real-world scenario, the technical challenges mainly faced by the man-machine collaborative mobile group perception technology are as follows:
technical challenge 1: the existing multi-agent learning technology based on centralized training cannot be used in a real environment, the existing multi-agent learning uses a centralized training mode to enable each agent to train the agent by utilizing respective local observation and global information obtained through information sharing, but because the area involved in information sharing is usually large and a plurality of obstacles for blocking communication signals exist, the centralized training cannot be carried out in the real environment.
Technical challenge 2: modeling of complex spatiotemporal information about moving people and dense obstacles in cities is difficult. In order to maximize the utilization of people in a city, it is necessary to consider that most people are uncontrollable, so the intelligent agent is required to plan its own data acquisition strategy according to the change of the space distribution of people with time.
Based on the problems in the prior art, the invention provides a man-machine cooperative perception method based on multi-agent space-time modeling and decision-making.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a human-computer cooperative perception method based on multi-agent space-time modeling and decision-making.
The invention adopts the following technical scheme:
a man-machine cooperative perception method based on multi-agent space-time modeling and decision making comprises the following steps:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample library of each unmanned platform, randomly initializing a data acquisition strategy of each unmanned platform, and starting a data acquisition task in a fully distributed mode in cooperation with people;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, and global history information is extracted from each three-dimensional memory Map by using global convolution reading operation;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
Further, step 1 comprises:
step 1.1, unmanned platform clusteringEmpty sample library of each unmanned platform uRandom initialization parameter thetau;
And step 1.2, initializing a time step t to be 0, and starting to interact with the human-computer cooperative crowd sensing environment.
Further, step 2 comprises:
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state stEach unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space;
step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract eachSpatial features from local observations
Further, in step 3, the global historical spatiotemporal information is stored in a three-dimensional memory mapIn the middle, global-based convolution read operations are used, all stored data are considered as a whole, and a convolution neural network is used to extract global information, as shown in the following formula (1):
in formula (1): phi is aread(. cndot.) represents a convolutional neural network.
Further, in step 2.1, global state stFor a three-dimensional vector, the first two dimensions are associated with two-dimensional coordinates, fsFor mapping the continuous coordinate values to discrete coordinate values,successive coordinate values for the unmanned platform u at the current time step t, thenWhere j controls the range of local observation.
Further, step 4 comprises:
step 4.1, using learnable parameter matrixFrom current local spatial featuresAnd global featuresIn which one is extracted by a convolution operationA query vectorThe following formula (2):
in formula (2): denotes matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vectorAnd three-dimensional memory storage mappingThe cross correlation coefficient matrix between them, as shown in the following formula (3):
in formula (3): sigma denotes a sigmod activation function,representing the calculation of cross-correlation coefficients;
step 4.3, use the cross correlation coefficient matrixMapping for three-dimensional memory storageWeighting and generating a context vector by convolving the weighted resultsThe following formula (4):
in formula (4): f. ofc(. two-dimensional vector by third dimension replication of dataExpansion into a three-dimensional vector fc(. cndot.) has the following formula (5):
in formula (5): an element-wise multiplication is indicated by an element.
Further, step 5 comprises:
step 5.1, store the mapping from three-dimensional memoryThe cubic area to be updated is selected according to the current position of the unmanned platform(X '× Y') determines the spatial granularity of the written eigenvectors;
step 5.2, using the learnable parameter matrixFrom the inputAndin generating a reset gate vector by a convolution operationThe following formula (6):
step 5.3, using the learnable parameter matrixFrom the inputAndgenerating an updated gate vector by a convolution operationThe following formula (7):
step 5.4, using the learnable parameter matrixAndfrom the inputAndby convolution operation, using reset gatesGenerating candidate vectorsThe following formula (8):
Further, in step 6, the current spatial feature information is processedStoring profilesAnd contextual informationPerforming a join operation to generate feature vectors using convolution on the join resultsThe following formula (10):
in formula (10): phi is aoutput(. h) denotes a convolution operation, [;]representing the concatenation of vectors.
Further, step 7 comprises:
step 7.1, the unmanned platform u uses the feature vectorSeparately inputting policy functionsAnd a cost functionGenerating actionsAnd value estimation
Step 7.2, each unmanned platform u executes the actionObtaining a prize valueAnd entering the next time step.
Further, step 8 comprises:
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track dataAnd according toCalculating a cumulative reward estimateAnd advantage estimationCalculating a cumulative reward estimate for a time step iThe following formula (11):
in formula (11): gamma epsilon [0, 1 is a discount factor, and the dominance estimation is calculated by using the GAE modeThe following formula (12):
in formula (12): lambda belongs to [0, 1]]Calculating a time differential offset for the discounting factorThe following formula (13):
step 8.3, each unmanned platform u is toSlicing according to length K in time dimension, and adding the generated sequence sample into a sample library
Step 8.4, each unmanned platform u learns from the sample library in a batch modeCollecting M sequence samples based on joint loss function in PPOFor parameter thetauUpdate is performed, and then the next round is entered, whereinIs a loss function of the policy function and,is a loss function of the cost function,is a regularization term that is related to the policy function,the calculation formulas of (a) are as follows (14) to (16):
in formula (14): s is the policy entropy, c1,c2,∈1,∈2Are all constant.
Compared with the prior art, the invention has the following advantages:
1. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making is completely distributed in the training and testing stages, does not depend on any communication, can be easily applied to wide and complex scenes in space, solves the technical challenge that the existing multi-agent learning technology based on centralized training cannot be used in the actual scene, adopts FD-MAPPO as the training frame of the unmanned platform cluster, has better perception data acquisition effect compared with the existing multi-agent learning technology, and can be widely applied to scenes which are large in area, complex in environment and difficult to communicate;
2. the man-machine cooperative perception method based on multi-agent space-time modeling and decision-making adopts a three-dimensional memory storage mapping Cubic Map, uses an original storage structure, is matched with local writing operation according to positions to store long-term space-time sequence data, keeps the integrity of global overall space information while recording the internal detail information of local spaces, lays a foundation for better extracting the features in the long-term space-time sequence data, takes the designed global and context-based reading operation and output operation as an extraction method, can ensure the comprehensiveness and accuracy of feature extraction, simultaneously provides required local detail information, and solves the technical challenge of difficult modeling of complex space-time information about mobile crowds, dense obstacles and the like in cities.
Drawings
FIG. 1 is a schematic diagram illustrating a human-machine cooperative perception method based on multi-agent spatiotemporal modeling and decision-making in an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an influence of the number U of unmanned platforms on data acquisition efficiency (λ) in the sensing method according to the embodiment of the present invention;
FIG. 3 is a diagram illustrating a data acquisition rate of the unmanned platform U pair according to the sensing method in the embodiment of the present inventionSchematic diagram of the effects of (1);
fig. 4 is a schematic diagram illustrating an influence of the number U of unmanned platforms on geographic fairness (ξ) in the sensing method according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the synergistic factor (ζ) in the sensing method according to the embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an influence of the number U of unmanned platforms on the energy consumption rate (β) in the sensing method according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating the utilization rate of the unmanned platform number U to the crowd according to the sensing method in the embodiment of the present inventionSchematic diagram of the effects of (1);
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments, it being understood that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.
Examples
As shown in fig. 1, the human-computer cooperative perception method based on multi-agent spatiotemporal modeling and decision includes:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample base of each unmanned platform, randomly initializing a data acquisition strategy (namely initializing a neural network parameter for decision), and starting a data acquisition task in a fully distributed mode in cooperation with a crowd;
step 1.1, unmanned platform clusteringEmpty sample library of each unmanned platform uRandom initialization parameter thetau;
Step 1.2, initializing a time step t to be 0, and starting to interact with a human-computer cooperative crowd sensing environment;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state st, and each unmanned platform u obtains corresponding local observation according to the position of the unmanned platform u in the global space
Step 2.2, each unmanned platform u uses the convolution neural network phi (-) to extract the spatial features in each local observation
Step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, global historical information is extracted from each three-dimensional memory Map by using a global convolution reading operation, wherein the global historical space-time information is stored in the three-dimensional memory MapIn the middle, global-based convolutional read operations are used, all stored data are treated as a whole, and a convolutional neural network is used to extract global information:
wherein phiread(. -) represents a convolutional neural network;
step 4.1, using learnable parameter matrixFrom current local spatial featuresAnd global featuresWherein a query vector is extracted by a convolution operation:
wherein denotes a matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vectorAnd three-dimensional memory storage mappingCross correlation coefficient matrix between:
where sigma denotes a sigmod activation function,representing the calculation of cross-correlation coefficients;
step 4.3, use the cross correlation coefficient matrixMapping for three-dimensional memory storageWeighting and generating a context vector by convolving the weighted results:
wherein f isc(. to) two-dimensional vector by copying data in a third dimensionExpansion into a three-dimensional vector:
an by element multiplication;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
step 5.1, store the mapping from three-dimensional memoryThe cubic area to be updated is selected according to the current position of the unmanned platform(X '× Y') determines the spatial granularity of the written eigenvectors;
step 5.2, using the learnable parameter matrixFrom the inputAndgenerating a reset gate vector by a convolution operation:
step 5.3, using the learnable parameter matrixFrom the inputAndgenerating an update gate vector by a convolution operation:
step 5.4, using the learnable parameter matrixAndfrom the inputAndby convolution operation, using reset gatesGenerating a candidate vector:
wherein phioutput(. -) represents a convolution operation;
step 7.1, the unmanned platform u uses the feature vectorSeparately inputting policy functionsAnd a cost functionGenerating actionsAnd value estimation
Step 7.2, each unmanned platform u executes the actionObtaining a prize valueEntering the next time step;
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track dataAnd according toCalculating a cumulative reward estimateAnd advantage estimationCalculate the cumulative reward estimate for a certain time step i:
where γ ∈ [0, 1 is the discount factor, the dominance estimate is calculated using the GAE approach:
wherein λ ∈ [0, 1] is a discount factor, and the time difference deviation is as follows:
step 8.3, each unmanned platform u is toSlicing according to K degrees in the time dimension, and adding the generated sequence samples into a sample library
Step 8.4, each unmanned platform u learns from the sample library in a batch modeCollecting M sequence samples based on the union in PPOSum loss functionFor parameter thetauUpdate is performed, and then the next round is entered, whereinIs a loss function of the policy function and,is a loss function of the cost function,is a regularization term that is related to the policy function,the calculation formula of (a) is as follows:
where S is the policy entropy, c1,c2,∈1,∈2Are all constant;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
In step 1 of the above embodiment, the method is used when the human-computer is cooperated with the data acquisition taskRepresent an unmanned platform cluster, such thatBy usingRepresenting the population, in a round time range [0, T]In-person, unmanned platform cluster and crowd share low-cost sensorIn the collection of data, the unmanned platform cluster is usually restricted to flying below a certain altitude, for example in the united states, according to LAANC (low altitude authorization and notification capability), unmanned platforms can fly up to 120 meters in controlled airspace, all buildings are considered as obstacles that unmanned platforms cannot fly over due to different regulations on unmanned platform flying altitude in different regions, and furthermore, unmanned platform charging stations are deployed in cities such as parking lotsSo as to lead the unmanned platform to go to supplement energy without loss of generality, a time slot system is adopted, namely the whole perception task is divided into equal T discrete time steps, all the unmanned platforms and the crowd move continuously in a two-dimensional environment, and each unmanned platform u can move a distance in any direction within the time step [ T, T + 1]Wherein deltamaxIs the maximum moving distance calculated by the unmanned platform according to the maximum moving speed in a time step,is the position of the unmanned platform u at the initial moment of the time step [ t, t +1), and each sensor p is provided withThe data volume is used for collecting unmanned platform clusters and crowds, and in each time step [ t, t + 1], if a certain sensor p is in the data perception range of a certain unmanned platform u and a certain person l, no sensor p is usedThe human platform u will collectIndividual data volume, person l will collectAn amount of data whereinAndare constants that represent the amount of data that an unmanned platform and a person, respectively, can collect from a single sensor at most at a single time step,the data quantity of the sensor p at the initial moment of the time step [ t, t +1) and the total data quantity collected by the unmanned platform u and the person l in the time step [ t, t +1) can be respectively represented as AndwhereinAndrespectively representing the sensors within the data perception ranges of the unmanned platform u and the person l in the time step [ t, t +1), each unmanned platform u is provided with a sensor in the beginning of the data acquisition taskAn initial energy. At each time step [ t, t +1), the unmanned platform u will be consumed by the moveAn energy, where η η is a mobile energy consumption factor, at the beginning of each time step [ t, t + 1], if the unmanned platform u is within the charging range of a certain charging station, andthen the unmanned platform u will be charged toEnergy, when a man-machine cooperative data acquisition task starts, each unmanned platform u empties a sample libraryRandom initialization parameter thetauAnd setting the current time step t as 0, and enabling the unmanned platform cluster and the crowd to start interacting with the environment.
The simulation experiment of the above embodiment used a set of crowd movement trajectory data from the NCSU of university of america obtained from CRAWDAD, where there were 35 crowd movement trajectories in the NCSU, generated by 32 college students, using a GPS receiver to record their movement trajectories in their daily lives, where the GPS receiver would record the positions of selected students every 30 seconds during several hours to generate one trajectory, using google maps for marking map data, including the positions and shapes of buildings, lakes, and mountains, the north-south span of the NCSU being 1790.18 meters, the east-west being about 2028.70 meters, the floor area being about 363 ten thousand square meters, and similarly, 104 sensors were placed on 99 buildingsRandomly generating the data from 1GB to 1.5GB, setting the initial position of each unmanned platform as the central point of a scene, setting the maximum flight speed to be 12km/h, the initial energy to be e _0 to 20kJ, the mobile energy consumption factor eta to be 0.01kJ/m, and setting the data of the human platform and the unmanned platformThe sensing radius is 50 meters and 60 meters respectively, the rate of data acquisition from a single sensor is 8.3Mbps and 166.7Mbps respectively, and the charging radius of the charging station is 20 meters in consideration of the length of the cable.
In step 2 of the above embodiment, with respect to the observation of the global scope, the observation in the area with the unmanned platform as the center and a distance as the radius is called local observation;
in step 7 of the above embodiment, the actions generated by using the policy function are, for example: distances of movement along two coordinate axes in a two-dimensional coordinate system respectively; each unmanned platform performs the generated action to obtain a reward value, such as: when the obstacle is hit, a negative reward is given, and when the data is collected, a positive reward is given.
In order to further show the performance of the embodiment in the aspect of the human-computer cooperative mobile group perception task, a complete and thorough system test is carried out, and the specific evaluation form is 6 indexes of the system when a round is finished:
1. data acquisition rateAnd the total data volume collected by all the unmanned platforms accounts for the total proportion of the initial data volume of the sensor.
2. Geographic fairness (ξ): and calculating the geographic fairness of the data acquired by all the unmanned platforms by adopting a Jain fairness index.
3. Cofactor (ζ): degree of synergy between unmanned platform clusters and populations.
4. Energy consumption rate (β): and the consumed energy accounts for the proportion of the sum of the initial energy and the supplementary energy of all the unmanned platforms in the movement of all the unmanned platforms.
5. Crowd utilization rateThe proportion of the data volume collected by the crowd under the ideal condition (without unmanned platform) is occupied by the actual data volume collected by the crowd, whereinRepresenting a subset of the population.
6. Data acquisition efficiency (λ): the aim of the invention is to maximize the data acquisition rateThe geographic fairness (xi) and the synergistic factor (zeta) are simultaneously minimized, and the energy consumption rate (beta) is synthesized into an index:
in addition, the following 6 reference techniques were used for comparison:
1. FD-MAPPO (neural map): neural Map is an existing and advanced memory storage structure, and maintains a two-dimensional three-dimensional memory storage Map by using operations such as reading, writing, updating and outputting, and the Neural Map is matched by using the FD-MAPPO training framework provided by the invention in order to compare the Neural Map with the Cubic Map.
2. RPG: this is an existing, advanced multi-agent deep reinforcement learning technique that uses a reward random exploration technique to achieve better performance based on a policy gradient algorithm.
3. IPPO: the method is a multi-agent deep reinforcement learning technology based on a PPO algorithm, wherein each agent shares parameters.
4. PPO: the technology is an advanced single-agent deep reinforcement learning technology.
5. e-Divert: this is a multi-agent deep reinforcement learning technique based on the MADDPG algorithm, which uses a distributed prior experience pool and LSTM to obtain better performance, which is the most advanced multi-agent deep reinforcement learning crowd-sourcing sensing technique.
6. Random: each unmanned platform moves by adopting a random strategy.
The number U of unmanned platforms in the scene and the crowd participation ratio omega are respectively used as independent variables, and the dependent variables are the evaluation indexes, namely the data acquisition efficiency (lambda) and the data acquisition rateGeographic fairness (ξ), co-factor (ζ), energy consumption rate (β) and crowd utilization rateTwo sets of tests were performed:
as shown in fig. 2, the learning framework consistently outperforms all other baseline techniques in terms of efficiency for the following reasons: PPO uses only one agent to control the behavior of multiple unmanned platforms, and may miss feedback from a particular unmanned platform, so that cooperation inside the unmanned platform cannot be sufficiently achieved; although e-river uses multiple agents, its deterministic strategy, DDPG, adopted is not good at action exploration, but is just crucial in the environment (e.g., in NCSU environments, the north and south campuses communicate through only two thin tunnels); IPPO performs far better than PPO and e-river because it uses multiple agents and a random strategy, however, parameter sharing across agents limits the potential of each unmanned platform to accurately capture observed features; RPG and FD-mappo (neural map) perform better than other baseline techniques, but still perform worse than FD-mappo (cubic map) because RPG explores using rewarding stochastic exploration techniques, but neglects spatio-temporal modeling of long trajectory sequences, in which respect the unmanned platform is easily disoriented and trapped in the obstacle; FD-mappo (neural map) tends to model spatio-temporal correlation by a two-dimensional map, but because the technique flattens the 3D tensor into a 1D tensor in the writing operation, the technique loses almost all spatial information in the modeling process;
as shown in FIGS. 2-5, the data acquisition efficiency (λ), data acquisition rate, of all methods when increasing the number of unmanned platformsBoth geographic fairness (ξ) and co-factor (ζ) will rise first and then gradually saturate as shown in fig. 6, with the use of more unmanned platforms, except e-Din addition to the vett technology, the energy consumption rate (β) of all the methods has an ascending trend and then tends to be saturated, because more unmanned platforms are used, so that corners and remote areas can be explored at an opportunity to collect richer data, but too many unmanned platforms do not bring additional benefits, so the energy consumption rate (β) tends to be saturated gradually, however, in contrast, the energy consumption rate (β) corresponding to the e-river technology is a descending trend and then tends to be saturated at first, and by checking the track of the unmanned platforms, the area explored by the newly added unmanned platforms is a corner with extremely few data due to the poor obstacle avoidance and exploration capacity of the e-river, so that the unmanned platforms are blocked by obstacles and stop moving at an early stage, and thus less energy is consumed;
as shown in fig. 7, the crowd utilization ratioWith the use of unmanned platforms increasing and with less unmanned platforms, the FD-mappo (cubic map) technique maintains relatively high crowd utilization by navigating unmanned platforms to those regions with few peopleThe highest efficiencies are achieved compared to FD-mappo (neural map) and RPG technologies, and as more unmanned platforms are deployed, they are also distributed to areas where sensors are densely deployed, and it is difficult to collect all data, although people may also be present in these areas, in order to collect as much data as possible. Crowd utilization if an unmanned platform intentionally bypasses the crowd to achieve a good 'cooperation' levelThis increases, but moving long distances results in higher energy consumption rates and ultimately still results in reduced efficiency.
As shown in FIG. 8, the crowd utilization ratio can be observedThis is always going down because 25% of the people initially enabled are already enough, so adding more people does not bring additional benefit to the data collection task that the crowd does in cooperation with the unmanned platform, Random, e-Divert, PPO and IPPO technologies do not perform well but crowd utilization is poorAre relatively high because they may explore areas where a crowd is present and where the crowd has collected almost all of the data, the present invention maintains relatively high crowd utilization compared to RPG and FD-mappo (neural map) technologiesThis is because FD-MAPPO (Cubic map) reduces the probability of redundant data collection by human-machine, and when the number of unmanned platforms is small (U ≦ 4), the unmanned platforms will be navigated to areas where the crowd cannot collect all data alone.
The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.
Claims (9)
1. A man-machine cooperative perception method based on multi-agent space-time modeling and decision making is characterized by comprising the following steps:
step 1, starting a fully distributed multi-agent deep reinforcement learning framework FD-MAPPO, emptying a sample library of each unmanned platform, randomly initializing a data acquisition strategy of each unmanned platform, and starting a data acquisition task in a fully distributed mode in cooperation with people;
step 2, extracting spatial features in respective local observation by each unmanned platform by using respective convolutional neural network;
step 3, each unmanned platform starts a three-dimensional memory Map Cubic Map, and global history information is extracted from each three-dimensional memory Map by using global convolution reading operation;
step 4, each unmanned platform uses reading operation based on context cross-correlation based on the spatial features in respective local observation and the global history information extracted from respective three-dimensional memory storage mapping, and weights the information in the three-dimensional memory storage mapping according to the cross-correlation coefficient between the information in the three-dimensional memory storage mapping and the local spatial features and the global history information;
step 5, each unmanned platform carries out local updating on the three-dimensional memory storage mapping based on the space characteristics in the current local observation;
step 6, each unmanned platform uses convolution operation to generate a feature vector based on the space features in the current local observation, the global history information and the context information extracted from the respective three-dimensional memory mapping, and each unmanned platform finishes the three-dimensional memory mapping Cubic Map;
step 7, each unmanned platform uses a strategy function to generate actions and a value function to generate value estimation based on the characteristic vectors, and each unmanned platform executes the generated actions to obtain reward values;
step 8, repeatedly executing the steps 2-7 until the data acquisition task is finished, and optimizing a strategy function and a value function based on respective track data by each unmanned platform;
and 9, repeatedly executing the steps 1-8 until the human-computer cooperative data acquisition efficiency is kept stable, and ending the fully distributed multi-agent deep reinforcement learning framework FD-MAPPO.
2. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein step 1 comprises:
step 1.1, unmanned platform clusteringEmpty sample library of each unmanned platform uRandom initialization parameter thetau;
And step 1.2, initializing a time step t to be 0, and starting to interact with the human-computer cooperative crowd sensing environment.
3. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 2 comprises:
step 2.1, for the current time step t, the human-computer cooperative crowd sensing environment has a global state stEach unmanned platform u obtains a corresponding local observation according to its position in global space
4. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 3, global historical spatiotemporal information is stored in a three-dimensional memory storage mapIn the middle, global-based convolution read operations are used, all stored data are considered as a whole, and a convolution neural network is used to extract global information, as shown in the following formula (1):
in formula (1): phi is aread(. cndot.) represents a convolutional neural network.
5. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 4 comprises:
step 4.1, using learnable parameter matrixFrom current local spatial featuresAnd global featuresWherein a query vector is extracted by a convolution operationThe following formula (2):
in formula (2): denotes matrix multiplication, [; represents the concatenation of vectors;
step 4.2, calculating the query vectorAnd three-dimensional memory storage mappingThe cross correlation coefficient matrix between them, as shown in the following formula (3):
in formula (3): sigma denotes a sigmod activation function,representing the calculation of cross-correlation coefficients;
step (ii) of4.3 Using the Cross correlation coefficient matrixMapping for three-dimensional memory storageWeighting and generating a context vector by convolving the weighted resultsThe following formula (4):
in formula (4): f. ofc(. two-dimensional vector by third dimension replication of dataExpansion into a three-dimensional vector fc(. cndot.) has the following formula (5):
in formula (5): an element-wise multiplication is indicated by an element.
6. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 5 comprises:
step 5.1, store the mapping from three-dimensional memoryThe cubic area to be updated is selected according to the current position of the unmanned platformDetermines the write eigenvectorSpatial granularity of (a);
step 5.2, using the learnable parameter matrixFrom the inputAndin generating a reset gate vector by a convolution operationThe following formula (6):
step 5.3, using the learnable parameter matrixFrom the inputAndgenerating an updated gate vector by a convolution operationThe following formula (7):
step 5.4, using the learnable parameter matrixAndfrom the inputAndby convolution operation, using reset gatesGenerating candidate vectorsThe following formula (8):
7. The multi-agent spatiotemporal modeling and decision-making based human-computer collaborative awareness method according to claim 1, wherein in step 6, the current spatial feature information is processedStoring profilesAnd contextual informationFeature vector generation using convolutionThe following formula (10):
in formula (10): phi is aoutput(. h) denotes a convolution operation, [;]representing the concatenation of vectors.
8. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 7 comprises:
step 7.1, the unmanned platform u uses the feature vectorSeparately inputting policy functionsAnd a cost functionGenerating actionsAnd value estimation
9. The multi-agent spatiotemporal modeling and decision-based human-computer collaborative awareness method according to claim 1, wherein step 8 comprises:
step 8.1, repeatedly executing the steps 2-7 until the data acquisition task is finished;
step 8.2, each unmanned platform u collects track dataAnd according toCalculating a cumulative reward estimateAnd advantage estimationCalculating a cumulative reward estimate for a time step iThe following formula (11):
in formula (11): gamma epsilon [0, 1 is a discount factor, and the dominance estimation is calculated by using the GAE modeThe following formula (12):
in formula (12): lambda belongs to [0, 1]]Time differential bias as a discounting factorThe following formula (13):
step 8.3, each unmanned platform u is toSlicing according to length K in time dimension, and adding the generated sequence sample into a sample library
Step 8.4, each unmanned platform u learns from the sample library in a batch modeCollecting M sequence samples based on joint loss function in PPOUpdate the parameter thetau, then proceed to the next round, whereIs a loss function of the policy function and,is a loss function of the cost function,is a regularization term that is related to the policy function,the calculation formulas of (a) are as follows (14) to (16):
in formula (14): s is the policy entropy, c1,c2,∈1,∈2Are all constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110943514.0A CN113805568B (en) | 2021-08-17 | 2021-08-17 | Man-machine collaborative perception method based on multi-agent space-time modeling and decision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110943514.0A CN113805568B (en) | 2021-08-17 | 2021-08-17 | Man-machine collaborative perception method based on multi-agent space-time modeling and decision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113805568A true CN113805568A (en) | 2021-12-17 |
CN113805568B CN113805568B (en) | 2024-04-09 |
Family
ID=78893696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110943514.0A Active CN113805568B (en) | 2021-08-17 | 2021-08-17 | Man-machine collaborative perception method based on multi-agent space-time modeling and decision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113805568B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203689A (en) * | 2016-07-04 | 2016-12-07 | 大连理工大学 | A kind of Hydropower Stations cooperation Multiobjective Optimal Operation method |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN112651486A (en) * | 2020-12-09 | 2021-04-13 | 中国人民解放军陆军工程大学 | Method for improving convergence rate of MADDPG algorithm and application thereof |
CN112880688A (en) * | 2021-01-27 | 2021-06-01 | 广州大学 | Unmanned aerial vehicle three-dimensional flight path planning method based on chaotic self-adaptive sparrow search algorithm |
-
2021
- 2021-08-17 CN CN202110943514.0A patent/CN113805568B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203689A (en) * | 2016-07-04 | 2016-12-07 | 大连理工大学 | A kind of Hydropower Stations cooperation Multiobjective Optimal Operation method |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN112651486A (en) * | 2020-12-09 | 2021-04-13 | 中国人民解放军陆军工程大学 | Method for improving convergence rate of MADDPG algorithm and application thereof |
CN112880688A (en) * | 2021-01-27 | 2021-06-01 | 广州大学 | Unmanned aerial vehicle three-dimensional flight path planning method based on chaotic self-adaptive sparrow search algorithm |
Non-Patent Citations (2)
Title |
---|
YOUQI LI.ETC: "MP-Coopetition: Competitive and Cooperative Mechanism for Multiple Platforms in Mobile Crowd Sensing", 《IEEE TRANSACTIONS ON SERVICES COMPUTING》, vol. 14, no. 6, pages 1935 - 1947 * |
YU WANG ETC.: "Human-Drone Collaborative Spatial Crowdsourcing by Memory-Augmented and Distributed Multi-Agent Deep Reinforcement Learning", 《2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING》, pages 459 - 471 * |
Also Published As
Publication number | Publication date |
---|---|
CN113805568B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113110592B (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
Rauch et al. | Maximum likelihood estimates of linear dynamic systems | |
CN101413806B (en) | Mobile robot grating map creating method of real-time data fusion | |
CN110118560B (en) | Indoor positioning method based on LSTM and multi-sensor fusion | |
CN110414732B (en) | Travel future trajectory prediction method and device, storage medium and electronic equipment | |
CN114625151A (en) | Underwater robot obstacle avoidance path planning method based on reinforcement learning | |
CN112069573A (en) | City group space simulation method, system and equipment based on cellular automaton | |
CN111027627A (en) | Vibration information terrain classification and identification method based on multilayer perceptron | |
Zhao et al. | A deep reinforcement learning based searching method for source localization | |
CN111376273A (en) | Brain-like inspired robot cognitive map construction method | |
CN111242352A (en) | Parking aggregation effect prediction method based on vehicle track | |
Dai et al. | Spatio-temporal deep learning framework for traffic speed forecasting in IoT | |
CN115561834A (en) | Meteorological short-term and temporary forecasting all-in-one machine based on artificial intelligence | |
CN116052427A (en) | Inter-city inter-regional mobility prediction method and device based on private car travel track data | |
CN103839280B (en) | A kind of human body attitude tracking of view-based access control model information | |
Wei et al. | Sensor-fusion for smartphone location tracking using hybrid multimodal deep neural networks | |
CN113159371B (en) | Unknown target feature modeling and demand prediction method based on cross-modal data fusion | |
CN114004152A (en) | Multi-wind-field wind speed space-time prediction method based on graph convolution and recurrent neural network | |
CN113805568B (en) | Man-machine collaborative perception method based on multi-agent space-time modeling and decision | |
CN116167254A (en) | Multidimensional city simulation deduction method and system based on city big data | |
Chancán et al. | CityLearn: Diverse real-world environments for sample-efficient navigation policy learning | |
CN104504207A (en) | Agent-based scenic spot visitor behavior simulation modeling method | |
Hugues | Collective grounded representations for robots | |
CN118393900B (en) | Automatic driving decision control method, device, system, equipment and storage medium | |
Wang et al. | Path planning model of mobile robots in the context of crowds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |