CN112231967A - Crowd evacuation simulation method and system based on deep reinforcement learning - Google Patents

Crowd evacuation simulation method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN112231967A
CN112231967A CN202010942444.2A CN202010942444A CN112231967A CN 112231967 A CN112231967 A CN 112231967A CN 202010942444 A CN202010942444 A CN 202010942444A CN 112231967 A CN112231967 A CN 112231967A
Authority
CN
China
Prior art keywords
evacuation
leader
crowd
path
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010942444.2A
Other languages
Chinese (zh)
Other versions
CN112231967B (en
Inventor
刘弘
李信金
孟祥栋
赵缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202010942444.2A priority Critical patent/CN112231967B/en
Publication of CN112231967A publication Critical patent/CN112231967A/en
Application granted granted Critical
Publication of CN112231967B publication Critical patent/CN112231967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Alarm Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosed crowd evacuation simulation method and system based on deep reinforcement learning comprise: initializing the constructed evacuation scene simulation model according to the scene information and the crowd parameter information; grouping the crowds, and dividing a leader and a follower of each group; and obtaining the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to obtain an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader. A learning curve and a high-priority experience playback strategy are introduced on the basis of the traditional MADDPG algorithm, an E-MADDPG algorithm is formed, the learning efficiency of the algorithm is improved, a hierarchical path planning method is provided on the basis of the E-MADDPG algorithm and used for planning the evacuation path of people, the path planning time is effectively shortened, people can be guided to evacuate better, and the crowd evacuation efficiency is improved.

Description

Crowd evacuation simulation method and system based on deep reinforcement learning
Technical Field
The disclosure relates to a crowd evacuation simulation method and system based on deep reinforcement learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the increasingly frequent occurrence of public safety problems, the problem of large-scale crowd evacuation becomes an important link which cannot be ignored in emergency treatment. In a place with dense crowds, once dangerous accidents happen, the crowds can rush to escape from a scene in order to avoid dangers, and therefore crowding is caused in the crowd evacuation process. If the people cannot be evacuated in time, collision and trampling accidents can be caused, and secondary damage is caused to the evacuated people. Meanwhile, large-scale crowd evacuation is a complex process, and large-scale crowd evacuation experiments are difficult to develop due to the problems of difficult organization, high cost, personnel safety and the like. Therefore, the computer simulation technology becomes a main means for analyzing the evacuation process and evaluating the evacuation efficiency.
How to improve the crowd evacuation efficiency and avoid secondary damage is always a problem of great concern of researchers. Reinforcement learning is one of the research hotspots in the field of artificial intelligence in recent years. The combination of reinforcement learning and path planning provides a new idea for improving crowd evacuation efficiency. The path planning algorithm based on multi-agent reinforcement learning greatly improves the efficiency of path planning, and has certain adaptability to dynamic environment because of continuous learning, and the practicability is stronger. However, most of real evacuation scenes are complex, the problem that the traditional reinforcement learning method is difficult to process is solved, and the deep learning can effectively process high-dimensional input and can better process complex real scenes. Therefore, the reinforcement learning and the deep learning are combined, and the learning strategy of the reinforcement learning and the capability of the deep learning to solve the high-dimensional input problem are combined, so that the method can be better applied to crowd evacuation simulation. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm proposed by Lowe et al is a new Multi-Agent Deep reinforcement learning algorithm, but the algorithm also has the problems of invariable state space, random experience playback and the like, and the learning efficiency of the algorithm is seriously influenced. Meanwhile, as the number of the intelligent agents for guiding evacuation increases and the complexity of the environment increases, huge state space is inevitably brought, and the application effect of the algorithm in the field of crowd evacuation is seriously influenced by the problems.
Disclosure of Invention
The invention provides a crowd evacuation simulation method and system based on Deep reinforcement learning to solve the problems, a learning curve and a high-priority experience playback strategy are introduced on the basis of the traditional MADDPG algorithm to form an Efficient Multi-Agent Deep Deterministic Policy Gradient (E-MADDPG) algorithm, the learning efficiency of the algorithm is improved, a hierarchical path planning method is provided on the basis of the E-MADDPG algorithm to plan the evacuation path of the crowd, the time of path planning is effectively shortened, the crowd can be better guided to be evacuated, and the crowd evacuation efficiency is improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
in one or more embodiments, a crowd evacuation simulation method based on deep reinforcement learning is provided, including:
initializing the constructed evacuation scene simulation model according to the scene information and the crowd parameter information;
grouping the crowds, and dividing a leader and a follower of each group;
and obtaining the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to obtain an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
And further, receiving a real scene database of a market, and acquiring a pedestrian movement stopping point as a state space of the E-MADDPG algorithm.
Furthermore, variation parameters are added to the experience pool capacity and the number of sampling samples in the MADDPG algorithm to form an experience pool curve and a sampling sample curve of the E-MADDPG algorithm, and the size of the experience pool and the number of sampling samples are adjusted through the variation parameters, so that the state space of the E-MADDPG algorithm is dynamically variable.
Furthermore, during network training of the E-MADDPG algorithm, samples with high value are selected for experience replay.
In one or more embodiments, a deep reinforcement learning crowd evacuation simulation system based on experience pool optimization is provided, comprising:
the initialization setting module is used for carrying out initialization setting on parameters in the evacuation scene simulation model according to the scene information and the crowd parameter information;
the in-group guidance selection module is used for grouping the crowds; selecting a leader in the group;
and the evacuation simulation module acquires the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to acquire an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
In one or more embodiments, an electronic device is provided, comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the deep reinforcement learning-based crowd evacuation simulation method.
In one or more embodiments, a computer-readable storage medium is provided for storing computer instructions which, when executed by a processor, perform the steps of the deep reinforcement learning-based crowd evacuation simulation method.
Compared with the prior art, the beneficial effect of this disclosure is:
1. the multi-agent deep reinforcement learning algorithm is applied to the path planning of crowd evacuation, and the crowd evacuation efficiency is improved.
2. The method takes the defects of a multi-agent deep reinforcement learning algorithm into consideration, provides an E-MADDPG algorithm on the basis of the MADDPG algorithm, improves learning efficiency by combining a learning curve to enable an experience pool to be dynamically variable, improves a random sampling mode of the algorithm to improve learning effectiveness, improves a state space of the algorithm, extracts a motion stopping point from a pedestrian video as the state space, and effectively solves the problem of dimension disaster.
3. The crowd evacuation route is acquired by adopting a hierarchical route planning method, the crowd is divided into a leader and a follower in consideration of the crowd psychology of people, the large-scale crowd evacuation simulation problem is divided into a group of sub-problems, and the evacuation is guided by the crowd grouping and the leader, so that the evacuation efficiency of public places can be effectively improved, and the safety of people in emergencies is ensured
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a flow chart of example 1 of the present disclosure;
fig. 2 is a pedestrian motion trajectory extracted by a YOLO V3 method in embodiment 1 of the present disclosure;
fig. 3 is an evacuation scenario diagram constructed in embodiment 1 of the present disclosure;
FIG. 4 is a schematic diagram of a crowd grouping in accordance with embodiment 1 of the present disclosure;
fig. 5 is a schematic view of crowd evacuation in embodiment 1 of the present disclosure;
fig. 6 is a schematic diagram of the evacuation end time of the crowd according to embodiment 1 of the disclosure.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In the present disclosure, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only relational terms determined for convenience in describing structural relationships of the parts or elements of the present disclosure, and do not refer to any parts or elements of the present disclosure, and are not to be construed as limiting the present disclosure.
In the present disclosure, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present disclosure can be determined on a case-by-case basis by persons skilled in the relevant art or technicians, and are not to be construed as limitations of the present disclosure.
Example 1
The embodiment discloses a crowd evacuation simulation method based on deep reinforcement learning, which comprises the following steps:
initializing the constructed evacuation scene simulation model according to the scene information and the crowd parameter information;
grouping the crowds, and dividing a leader and a follower of each group;
and obtaining the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to obtain an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
Further, a real scene database of a market is received, and a moving stopping point of the pedestrian is obtained from the pedestrian video by adopting a YOLO V3 method and is used as a state space of the E-MADDPG algorithm.
Furthermore, variation parameters are added to the experience pool capacity and the number of sampling samples in the MADDPG algorithm to form an experience pool curve and a sampling sample curve of the E-MADDPG algorithm, and the size of the experience pool and the number of sampling samples are adjusted through the variation parameters, so that the state space of the E-MADDPG algorithm is dynamically variable.
Furthermore, during network training of the E-MADDPG algorithm, samples with high value are selected for experience replay.
Further, the group leader performs global path planning through an E-MADDPG algorithm to obtain an optimal evacuation path, specifically:
acquiring all evacuation paths of the leader according to the exit position and the initial position of the leader;
calculating a reward value for each evacuation path;
and selecting the evacuation path with the maximum reward value as the optimal evacuation path.
Further, the intra-group followers avoid obstacles based on the RVO algorithm and follow the leaders to evacuate along the optimal evacuation path, and the method specifically comprises the following steps:
calculating all the speeds of the followers in collision and the optimal collision-free speed, wherein the direction of the optimal collision-free speed is the direction of the leader in the group moving along the optimal evacuation path;
acquiring the current position of a follower;
when the optimal collision-free speed of the follower is obtained, the position of the follower is updated.
The method for simulating crowd evacuation based on deep reinforcement learning is specifically described with reference to fig. 1 to 6, and includes the following steps:
step 1: receiving a real scene database of a shopping mall, and extracting a pedestrian movement parking point in a video as a state space by using a YOLO V3 method;
the state information of the deep reinforcement learning represents the environment information perceived by the intelligent agent and the change caused by the change of the self action, the state information is the basis for the intelligent agent to make a decision and evaluate the long-term benefit of the intelligent agent, the condition of the state design directly determines whether the deep reinforcement learning algorithm can be converged and the convergence speed is high or low, and as the evacuation scene is enlarged and refined, the explosion of the state space is inevitably caused, which is called a dimension disaster, in order to solve the problem, the embodiment provides a new state representation method, and the stationary point of the pedestrian motion track is extracted from the real pedestrian video by adopting a YOLO V3 method to obtain the corresponding state change point, on the basis, all the state change points in the scene can be used as the state space, and the process is shown in FIG. 2.
Step 2: creating an evacuation scene model and a character model according to preset evacuation scene parameter information, as shown in fig. 3, introducing the character model into the evacuation scene model, initializing the crowd parameter information as preset evacuation crowd parameter information, grouping the crowds, and dividing a leader and a follower for each group of crowds as shown in fig. 4;
the initialized population is grouped, the leader and the followers are divided, and each population in the space has one leader to evacuate the followers, as shown in fig. 5. The leader and follower should have the following characteristics:
(1) the leader needs to know the location of the exits.
(2) The follower should always follow the leader during evacuation.
The result after grouping is shown in fig. 4.
And step 3: in the evacuation process, a layered path planning method is adopted to obtain evacuation paths of crowds, wherein the upper layer realizes global path planning for each group of leaders by using an E-MADDPG algorithm to obtain an optimal evacuation path, the bottom layer realizes collision avoidance for followers in each group by using an RVO algorithm, and the followers in the group are followed to evacuate along the optimal evacuation path;
in the top layer: when the exit position p is knownjAnd the initial position of leader i
Figure BDA0002674088420000081
When the operation is executed, the next position is reached
Figure BDA0002674088420000082
This position then continues to perform the operation as the current position and repeats the operation until he reaches the exit position pjAnd grouping the bitsThe sequence is regarded as the evacuation path of the leader i, and the successfully evacuated k paths are temporarily stored in a path BufferiIn (1). But the reward per path may be very different due to the simultaneous influence of other agents. The disclosure derives reward values for a set of sequences of positions in k paths in a buffer
Figure BDA0002674088420000083
When k candidate paths in the Path buffer area are traversed, the Path with the maximum reward R in the Path buffer area is selected as the optimal evacuation Path of the leader i, and a Path set Path is outputi
In the bottom layer: the process of using RVO to realize pedestrian obstacle avoidance can be realized in two steps, firstly calculating all speeds v of collision between an individual i and an individual j in a neighborhoodcThen, the individual i is calculated to select the collision-free velocity and to select the optimal collision-free velocity vbWherein the optimum speed vbIs a vector having a direction and a magnitude, the direction being
Figure BDA0002674088420000091
piRepresenting the global path of the individual obtained by the upper layer,
Figure BDA0002674088420000092
representing the position of the individual i at time t. Once the collision-free velocity of the individual is obtained, its position is updated to
Figure BDA0002674088420000093
wherein
Figure BDA0002674088420000094
Is the current position. This makes all pedestrians' movements during evacuation the optimal path based on upper route planning training.
And 4, step 4: and finishing the evacuation process when the number of the finally exported persons is equal to the total number of the persons.
That is, it ends when the number of people coming out of the final exit is equal to the total number of people, as shown in fig. 6.
A learning curve and a high-priority experience playback strategy are introduced on the basis of a traditional MADDPG algorithm, so that an Efficient Multi-Agent Deep Deterministic Policy Gradient (E-MADDPG) algorithm is formed.
(1) Combined learning curve
Because the external environment provides little information, reinforcement learning is learned in a "trial and error" manner, by which the reinforcement learning system gains experience in the action-evaluation environment, improving the action profile to suit the environment. From the learning curve theory, it can be seen that as the knowledge increases, the learning efficiency is enhanced, and the learning efficiency is undoubtedly affected by the experience pool with fixed capacity in the maddppg algorithm.
Among the many learning curves, Wright learning curve is most widely used, and its learning curve equation is as follows:
y(x)=kxα (1)
α=lg x/log 2 (2)
wherein x is the number of exploration times, k is the first learning effect, y (x) is the time used for the x time, and alpha is the learning coefficient.
The method is characterized in that a relation between the production capacity and the time in a learning curve theory is referred, a change parameter is added to the capacity of an experience pool and the number of sampling samples in an algorithm, an experience pool curve and a sampling sample curve are provided, the experience pool of the MADDPG algorithm is improved by combining the experience pool curve, a change parameter 3 is added, the size of the experience pool is adjusted through the change parameter, the experience pool is dynamically changed, the influence of undersize or oversize of the experience pool on the learning efficiency in the learning process is eliminated, and the change function is as follows after the improvement:
Figure BDA0002674088420000101
Figure BDA0002674088420000102
wherein, R (t) is the current size, and t is the number of learning times.
Also along with sample figure greatly increased, the efficiency of study is probably influenced to fixed sample collection figure, and this disclosure adjusts the sampling figure through changing parameter 3, and its change function is after improving:
Figure BDA0002674088420000103
wherein, n (t) is the current sample collection number, and t is the learning times.
(2) Playback with priority experience
In the conventional experience replay mechanism, the random sampling method makes the experience transmitted to the network training completely random, so that the training efficiency of the network is low. To address the above-mentioned problems, the present disclosure selects valuable samples in the replay buffer, and the core idea of priority empirical replay is to replay very successful attempts more frequently or to render extremely bad correlated samples, thereby increasing the efficiency of learning. The idea of priority empirical replay comes from a preferential cleaning, which replays samples that are more useful to learning at a high frequency, the present disclosure uses TD-error to measure the magnitude of the effect. The meaning of TD-error is the difference between the estimated value of a certain action and the value output by the current value function, the larger TD-error indicates that the sample has higher value, and the high-value samples are played back more frequently, which can help the intelligent body to improve the effectiveness of learning, thereby improving the overall performance and learning more from the sample.
The present disclosure is intended to ensure that a new sample, for which TD-error is temporarily unknown, is played back at least once, placing it first, and thereafter, playing back the sample for which TD-error is largest each time.
The present disclosure selects the absolute value | δ of the TD-error of the sampletL serves as a criterion for evaluating the value of the sample. | δtThe equation for | is as follows:
δt=r(si,ai)+γQ′(si+1,μ′(si+1μ′)|θQ′)-Q(si,aiQ) (6)
wherein ,r(si,ai) For the reward function, γ ∈ (0, 1) is the discount factor, Q' (s, a | θ)Q′) Is the target action value network, Q (s, a | θ)Q) Is a network of action values, μ (s | θ)μ) Is an actor network, thetaQAnd thetaμIs a network parameter.
The present disclosure defines a multi-agent reinforcement learning based standard reinforcement learning element, specifically defined as follows:
definition 1 (state): is denoted by S, StEs can be expressed as the pedestrian' S position at time t. In the learning process, S includes the leader' S current location and the set of waypoints for the path plan.
Definition 2 (action): is marked as A, ate.A represents the action of the agent selecting the next state from the current state.
Define 3 (reward function): denoted R, represents the reward of the environment for the action after the execution of the a action. In multi-agent path planning, two tasks are mainly required to be completed: and the destination is reached, and collision is avoided. The reward function should be closely related to both tasks. The reward function in this disclosure is defined as follows:
Figure BDA0002674088420000121
where r is an adaptation function defined as:
r=μ1*(di-di+1)+μ2*(dk-dk+1)+μ3*(cj-cj+1) (8)
wherein ,μ1,μ2,μ3Usually a positive value, and μ123=1。diDenotes the minimum distance, d, from the current position to the exitobsRepresenting the distance from the current position to the nearest obstacle, cjTo representCongestion level of egress j. The definitions are as follows:
Figure BDA0002674088420000122
Figure BDA0002674088420000123
cj=pj/bj (11)
wherein ,(xi,yi) Is the current location of the leader, (x)j,yj) J ∈ (1, m) as the exit position; (x)k,yk) K ∈ (1, n) is the position of the obstacle. p is a radical ofjRepresenting the number of persons at the target point j, by bjRepresenting the number of persons passing the target point j per unit time.
The proposed E-MADDPG algorithm specifically comprises the following steps:
randomly initializing operator network and critical network parameters: thetaQAnd thetaμ
Initializing a target network parameter θQAnd thetaμ
Initializing a buffer D, and sampling the absolute value | delta of the minimum value of the number N, TD-errorminI, pointer P is 1;
for episode=1,M do
initializing a random process phi for behavioral exploration
Receiving an initial observation state s1
for t=1,T do
For agentiSelecting actions based on current strategy and exploration noise
Figure BDA0002674088420000134
Figure BDA0002674088420000133
Performing action atReturn of the prize value rtAnd a new state st+1
Store experience et=(st,at,rt,st+1) To replay buffer D
Figure BDA0002674088420000131
Calculating samples etIs (d) | deltat|
If|δt|>|δmin|,then
Insert into etQuerying TD-error minimum value update | deltamin|;
P=P+1:
End if
For agenti=1,X do
Selecting N samples from D
Figure BDA0002674088420000132
Setting 1yi=ri+γQ′(Si+1μ)|θQ′
The criticc network is trained by minimizing the loss function L:
Figure BDA0002674088420000141
updating the operator strategy by adopting a gradient strategy:
Figure BDA0002674088420000142
End for
updating the target network:
End for
End for
in the embodiment, the multi-agent deep reinforcement learning algorithm is applied to the path planning of crowd evacuation, so that the crowd evacuation efficiency is improved.
In the embodiment, the defects of the multi-agent deep reinforcement learning algorithm are considered, the E-MADDPG algorithm is provided on the basis of the MADDPG algorithm, the learning efficiency is improved by combining the learning curve to enable the experience pool to be dynamically variable, and then the learning effectiveness is improved by improving the algorithm random sampling mode. And the state space of the algorithm is improved, and the motion stopping point is extracted from the pedestrian video to be used as the state space, so that the problem of dimension disaster is effectively solved.
In the embodiment, a hierarchical path planning method is adopted, the crowd is divided into a leader and a follower in consideration of the crowd psychology of people, and the large-scale crowd evacuation simulation problem is divided into a group of sub-problems. The evacuation is guided by the crowd grouping and the leader, so that the evacuation efficiency of the public places can be effectively improved, and the safety of people in emergencies is ensured.
Example 2
In this embodiment, a crowd evacuation simulation system based on deep reinforcement learning with experience pool optimization is disclosed, which includes:
the initialization setting module is used for carrying out initialization setting on parameters in the evacuation scene simulation model according to the scene information and the crowd parameter information;
the in-group guidance selection module is used for grouping the crowds; selecting a leader in the group;
and the evacuation simulation module acquires the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to acquire an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
Example 3
In this embodiment, an electronic device is disclosed, comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the deep reinforcement learning-based crowd evacuation simulation method.
Example 4
In this embodiment, a computer readable storage medium is disclosed for storing computer instructions which, when executed by a processor, perform the steps of the deep reinforcement learning based crowd evacuation simulation method.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. The crowd evacuation simulation method based on deep reinforcement learning is characterized by comprising the following steps:
initializing the constructed evacuation scene simulation model according to the scene information and the crowd parameter information;
grouping the crowds, and dividing a leader and a follower of each group;
and obtaining the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to obtain an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
2. The deep reinforcement learning-based crowd evacuation simulation method according to claim 1, wherein a database of real scenes in a shopping mall is received, and a YOLO V3 method is used to obtain pedestrian motion stopping points from a pedestrian video as a state space of an E-madpg algorithm.
3. The deep reinforcement learning-based crowd evacuation simulation method according to claim 1, wherein variation parameters are added to the experience pool capacity and the number of sampling samples in the MADDPG algorithm to form an experience pool curve and a sampling sample curve of the E-MADDPG algorithm, and the state space of the E-MADDPG algorithm is dynamically variable by adjusting the experience pool size and the number of sampling samples through the variation parameters.
4. The deep reinforcement learning-based crowd evacuation simulation method according to claim 1, wherein during network training of the E-MADDPG algorithm, samples with high value are selected for experience replay.
5. The deep reinforcement learning-based crowd evacuation simulation method according to claim 1, wherein the group leader performs global path planning by using an E-madpg algorithm to obtain an optimal evacuation path, specifically:
acquiring all evacuation paths of the leader according to the exit position and the initial position of the leader;
calculating a reward value for each evacuation path;
and selecting the evacuation path with the maximum reward value as the optimal evacuation path.
6. The deep reinforcement learning-based crowd evacuation simulation method according to claim 5, wherein the exit selected by the leader is rewarded to obtain the reward value of the evacuation path according to whether the leader reaches the exit and whether the collision occurs.
7. The deep reinforcement learning-based crowd evacuation simulation method according to claim 1, wherein the group followers avoid obstacles based on the RVO algorithm to follow the leaders to evacuate along the optimal evacuation path, and the method comprises the following specific steps:
calculating all the speeds of the followers in collision and the optimal collision-free speed, wherein the direction of the optimal collision-free speed is the direction of the leader in the group moving along the optimal evacuation path;
acquiring the current position of a follower;
when the optimal collision-free speed of the follower is obtained, the position of the follower is updated.
8. Crowd evacuation simulation system of degree of depth reinforcement study based on experience pond is optimized, its characterized in that includes:
the initialization setting module is used for carrying out initialization setting on parameters in the evacuation scene simulation model according to the scene information and the crowd parameter information;
the intra-group leader selection module is used for grouping all the individuals; selecting a leader in the group;
and the evacuation simulation module acquires the evacuation path of the crowd by adopting a hierarchical path planning method, wherein the leader in the upper-layer group carries out global path planning through an E-MADDPG algorithm to acquire an optimal evacuation path, and the follower in the lower-layer group carries out evacuation along the optimal evacuation path by avoiding obstacles and following the leader.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
CN202010942444.2A 2020-09-09 2020-09-09 Crowd evacuation simulation method and system based on deep reinforcement learning Active CN112231967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010942444.2A CN112231967B (en) 2020-09-09 2020-09-09 Crowd evacuation simulation method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010942444.2A CN112231967B (en) 2020-09-09 2020-09-09 Crowd evacuation simulation method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112231967A true CN112231967A (en) 2021-01-15
CN112231967B CN112231967B (en) 2023-05-26

Family

ID=74117069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010942444.2A Active CN112231967B (en) 2020-09-09 2020-09-09 Crowd evacuation simulation method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112231967B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113156979A (en) * 2021-05-27 2021-07-23 浙江农林大学 Forest guard patrol path planning method and device based on improved MADDPG algorithm
CN113359859A (en) * 2021-07-16 2021-09-07 广东电网有限责任公司 Combined navigation obstacle avoidance method and system, terminal device and storage medium
CN114518771A (en) * 2022-02-23 2022-05-20 深圳大漠大智控技术有限公司 Multi-unmanned aerial vehicle path planning method and device and related components
KR20220141576A (en) * 2021-04-13 2022-10-20 한기성 Evacuation route simulation device using machine learning and learning method
CN115454074A (en) * 2022-09-16 2022-12-09 北京华电力拓能源科技有限公司 Evacuation path planning method and device, computer equipment and storage medium
CN116167145A (en) * 2023-04-23 2023-05-26 中铁第四勘察设计院集团有限公司 Method and system for constructing space three-dimensional safety evacuation system of under-road complex

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670270A (en) * 2019-01-11 2019-04-23 山东师范大学 Crowd evacuation emulation method and system based on the study of multiple agent deeply
CN110491132A (en) * 2019-07-11 2019-11-22 平安科技(深圳)有限公司 Vehicle based on video frame picture analyzing, which is disobeyed, stops detection method and device
CN111414681A (en) * 2020-03-13 2020-07-14 山东师范大学 In-building evacuation simulation method and system based on shared deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670270A (en) * 2019-01-11 2019-04-23 山东师范大学 Crowd evacuation emulation method and system based on the study of multiple agent deeply
CN110491132A (en) * 2019-07-11 2019-11-22 平安科技(深圳)有限公司 Vehicle based on video frame picture analyzing, which is disobeyed, stops detection method and device
CN111414681A (en) * 2020-03-13 2020-07-14 山东师范大学 In-building evacuation simulation method and system based on shared deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
许诺等: "稀疏奖励下基于MADDPG算法的多智能体协同", 《现代计算机》 *
郑尚菲: "基于深度强化学习的路径规划方法及应用", 《中国优秀硕士学位论文全文数据库 社会科学Ⅰ辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220141576A (en) * 2021-04-13 2022-10-20 한기성 Evacuation route simulation device using machine learning and learning method
KR102521990B1 (en) 2021-04-13 2023-04-14 한기성 Evacuation route simulation device using machine learning and learning method
CN113156979A (en) * 2021-05-27 2021-07-23 浙江农林大学 Forest guard patrol path planning method and device based on improved MADDPG algorithm
CN113156979B (en) * 2021-05-27 2022-09-06 浙江农林大学 Forest guard patrol path planning method and device based on improved MADDPG algorithm
CN113359859A (en) * 2021-07-16 2021-09-07 广东电网有限责任公司 Combined navigation obstacle avoidance method and system, terminal device and storage medium
CN113359859B (en) * 2021-07-16 2023-09-08 广东电网有限责任公司 Combined navigation obstacle avoidance method, system, terminal equipment and storage medium
CN114518771A (en) * 2022-02-23 2022-05-20 深圳大漠大智控技术有限公司 Multi-unmanned aerial vehicle path planning method and device and related components
CN115454074A (en) * 2022-09-16 2022-12-09 北京华电力拓能源科技有限公司 Evacuation path planning method and device, computer equipment and storage medium
CN116167145A (en) * 2023-04-23 2023-05-26 中铁第四勘察设计院集团有限公司 Method and system for constructing space three-dimensional safety evacuation system of under-road complex

Also Published As

Publication number Publication date
CN112231967B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN112231967A (en) Crowd evacuation simulation method and system based on deep reinforcement learning
CN111142522B (en) Method for controlling agent of hierarchical reinforcement learning
CN110276765B (en) Image panorama segmentation method based on multitask learning deep neural network
CN106970615B (en) A kind of real-time online paths planning method of deeply study
CN110874578B (en) Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning
CN111766782B (en) Strategy selection method based on Actor-Critic framework in deep reinforcement learning
JP2022516383A (en) Autonomous vehicle planning
JP2020126619A (en) Method and device for short-term path planning of autonomous driving through information fusion by using v2x communication and image processing
EP3772710A1 (en) Artificial intelligence server
CN111274438B (en) Language description guided video time sequence positioning method
CN111476771B (en) Domain self-adaption method and system based on distance countermeasure generation network
CN112231968A (en) Crowd evacuation simulation method and system based on deep reinforcement learning algorithm
WO2022007867A1 (en) Method and device for constructing neural network
CN110795833A (en) Crowd evacuation simulation method, system, medium and equipment based on cat swarm algorithm
Szep et al. Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion.
CN113888638A (en) Pedestrian trajectory prediction method based on attention mechanism and through graph neural network
Hoy et al. Learning to predict pedestrian intention via variational tracking networks
WO2020099854A1 (en) Image classification, generation and application of neural networks
CN109508686A (en) A kind of Human bodys' response method based on the study of stratification proper subspace
CN112121419A (en) Virtual object control method, device, electronic equipment and storage medium
CN117455553B (en) Subway station passenger flow volume prediction method
JP2021197184A (en) Device and method for training and testing classifier
CN114548497B (en) Crowd motion path planning method and system for realizing scene self-adaption
CN112947466B (en) Parallel planning method and equipment for automatic driving and storage medium
CN112330043B (en) Evacuation path planning method and system combining Q-learning and multi-swarm algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant