CN109543285B - Crowd evacuation simulation method and system integrating data driving and reinforcement learning - Google Patents
Crowd evacuation simulation method and system integrating data driving and reinforcement learning Download PDFInfo
- Publication number
- CN109543285B CN109543285B CN201811382707.8A CN201811382707A CN109543285B CN 109543285 B CN109543285 B CN 109543285B CN 201811382707 A CN201811382707 A CN 201811382707A CN 109543285 B CN109543285 B CN 109543285B
- Authority
- CN
- China
- Prior art keywords
- crowd
- state
- group
- similarity
- individuals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000002787 reinforcement Effects 0.000 title claims abstract description 37
- 238000004088 simulation Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 12
- 230000008878 coupling Effects 0.000 claims abstract description 6
- 238000010168 coupling process Methods 0.000 claims abstract description 6
- 238000005859 coupling reaction Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 31
- 230000009471 action Effects 0.000 claims description 28
- 238000004422 calculation algorithm Methods 0.000 claims description 23
- 230000006399 behavior Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- SLXKOJJOQWFEFD-UHFFFAOYSA-N 6-aminohexanoic acid Chemical compound NCCCCCC(O)=O SLXKOJJOQWFEFD-UHFFFAOYSA-N 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 6
- 230000011273 social behavior Effects 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000701972 Corynephage omega Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a crowd evacuation simulation method and a system for integrating data driving and reinforcement learning, wherein the method comprises the following steps: acquiring real video data, and carrying out crowd tracking according to the video data; group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained; initializing scene information and crowd positions, and dividing the crowd into groups; according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene; generating paths of individuals in the groups according to the path information of each group; and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation. The invention can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity and provide reference for the establishment of crowd evacuation schemes.
Description
Technical Field
The disclosure belongs to the field of crowd evacuation computer simulation, and particularly relates to a crowd evacuation simulation method integrating data driving and reinforcement learning.
Background
Group modeling and simulation are research fields which are attracting more and more attention from industry, academia and government departments in recent years, with the rapid development of social economy in China, the living standard of people is continuously improved, and people go out more frequently, especially in public places with dense crowd distribution, such as railway stations, tourist attractions, shopping squares and the like, the flow of people is very large in a short time, fine disturbance in the crowd can greatly influence the crowd evacuation efficiency, the potential safety hazard is extremely large, and crowd crowding trampling events are very easy to cause if people cannot be effectively controlled.
Therefore, the evacuation situation of the real crowd in the occurrence of the crisis event is simulated, the potential crowd congestion trampling risk can be avoided in advance, and the method has very important research value. The traditional crowd evacuation simulation method is mainly used for improving the evacuation efficiency of the crowd in a specific scene, but a plurality of assumed conditions reduce the reality of the crowd movement; the existing data driving method improves the reality of crowd simulation, but has great dependence on data quantity, and most of the data driving method cannot adapt to dynamic scenes.
Disclosure of Invention
In order to overcome the defects in the prior art, the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. According to the method, firstly, the real data of the video are extracted, modeling is conducted on the crowd behaviors, secondly, a reinforcement learning method is fused, the limited real data are utilized, the crowd movement in different scenes is achieved, the crowd movement behaviors are simulated more truly, and a reference is provided for the establishment of a crowd evacuation scheme.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
a crowd evacuation simulation method integrating data driving and reinforcement learning comprises the following steps:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
Further, grouping the population according to the motion similarity includes:
initializing m individuals as cluster centers, and respectively calculating the motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;
for each cluster, generating a virtual cluster center, respectively calculating the motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;
iterative updating is performed on the m group cluster cores until no more changes occur.
Further, path information E of the group r The calculation method comprises the following steps:
E r =(p′ 1 ,...,p′ c ,...,p′ k )
wherein p is c ' represents the position information of a group at time c, ω i ,ω j The weighted values of the x coordinate and the y coordinate of the ith individual in the group are respectively equal to or greater than 1 and equal to or less than n, and n is the number of individuals in the group.
Further, the motion similarity is expressed as:
wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w d 、w v And w o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively d +w v +w o =1。
Further, the path planning of the group in the implementation scene includes:
discretizing the scene into a grid and representing each position state using the center of the cell; defining an action set A for each state i Action a e A i A is the state to be selected next;
constructing a reward function according to the position relation between the action and the target state and the similarity between the group and the track in the video and the scene;
and performing path planning on each group in the scene based on the Q-learning method.
Further, performing path planning on each group in the scene based on the Q-learning method includes:
given a target state s goal Initializing a Q matrix;
randomly selecting a state as an initial state S epsilon S i Selecting an action a epsilon A in the action set i ;
Calculating the immediate report r (s, a) and the next state s', updating the Q matrix by the following formula until s=s goal ;
Where α is the learning rate and s' is the new state after taking action in state s;
Further, the reward function is as follows:
wherein w is 1 And w 2 Respectively distance function r d Similarity functionr sim D is the difference between the distance of the current state position from the target point position and the distance of the next state position from the target point position;
distance function: r is (r) d =pathdist(s goal ,s)-pathdist(s goal ,s′)
s is the current state, s' is the state after an action is performed, pathdist (s goal S) represents the distance from state s to the target state;
mapping the group track in the real video to a stereoscopic space where the scene is located, wherein a represents a position vector from a current state to a next state in the video, and b represents a position vector from the current state to the next state in the simulation process.
Further, the method for generating the path of the individual in the group according to the path information of each group is as follows:
assume that the path sequence of a group is { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) A pedestrian's path sequence within the group is: { (x) 0 +δ 1 Δx,y 0 +δ 2 Δy),...(x i +δ 1 Δx,y i +δ 2 Δy),...(x n +δ 1 Δx,y n +δ 2 Δy)};
Wherein delta 1 ,δ 2 Delta as an influencing factor when the pedestrian group is in a linear group 1 ,δ 2 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 1 ,δ 2 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 )。
Further, the inter-individual collision detection employs RVO technology.
One or more embodiments provide a computer readable storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the crowd evacuation simulation method of fused data driving and reinforcement learning.
One or more embodiments provide a computer system including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the crowd evacuation simulation method of fusion data driving and reinforcement learning when executing the program. .
The one or more of the above technical solutions have the following beneficial effects:
the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. The method utilizes a group generation algorithm to model crowd grouping behaviors, combines video data and a reinforcement learning method to obtain a group motion path, and utilizes a position offset factor to obtain an individual path. The method can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity, and provide reference for the establishment of a crowd evacuation scheme.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart of a crowd evacuation simulation method integrating data driving and reinforcement learning according to an embodiment of the disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments and features of embodiments in this application may be combined with each other without conflict.
As is well known, the reinforcement learning (Reinforcement learning) approach possesses powerful control strategies that can learn to mimic a wide range of example actions while accommodating changes in morphology and accomplishing user-specified goals. Therefore, the fusion of the data driving technology and the reinforcement learning method has important significance for simulating the real crowd by using limited data. Firstly, tracking crowd in a real video, modeling crowd grouping behaviors through distance and speed characteristics, fitting a track, obtaining group path information, and performing data guidance for next-stage work so as to improve the calculation efficiency of the path and the authenticity of a simulation effect; secondly, establishing a path planning method integrating data driving and reinforcement Learning, providing a double-layer relation mechanism, and combining group path information in a video with a Q-Learning algorithm training optimal strategy in reinforcement Learning by an upper layer to realize group path planning; the lower layer acquires individual paths through a position deviation factor based on social characteristics and performs collision avoidance by combining a relative speed obstacle method (Reciprocal Velocity Obstacles, RVO); and finally, generating crowd animation by using a sense of reality rendering method.
Example 1
The embodiment discloses a crowd evacuation simulation method integrating data driving and reinforcement learning, as shown in fig. 1, comprising the following steps:
step 1: initializing crowd position and scene information;
the initialization of the individual position in the step 1 is to randomly initialize the individual within the range of the scene and avoid all the obstacles by comprehensively considering scene information and obstacle information.
Step 2: establishing a clustering behavior modeling method based on data driving;
the clustering behavior modeling method based on data driving in the step 2 comprises four parts: video capture and data extraction, definition of grouping behavior based on motion features, group generation algorithms and grouping path information computation.
Step 2.1: crowd tracking is carried out based on video data, and individual track information is obtained;
and shooting the crowd by using a small ant intelligent camera for real-time flow recording.
And for a real video, tracking the crowd in the video by using a TLD (Tracking-Learning-Detection) target Tracking algorithm to obtain individual track information in the crowd. The TLD tracking algorithm integrates a tracker, a detector and a learning module, so that continuous moving targets can be tracked, shielding reproduction targets can be detected and tracked again, and shielding resistance is excellent.
In tracking crowd video, it is first necessary to identify environmental features (such as walls and obstructions) and then track the individual's two-dimensional trajectory, if this trajectory is unsatisfactory, then browse the video frames and specify the location of the target in the middle video frame, and then apply bi-directional tracking to two separate time intervals (the start time and the middle time interval). In this way, the user can adaptively modify the trajectory until a satisfactory result is obtained. After the tracking is completed, the required trace information E of each person is obtained, wherein each trace can be expressed as:
E={(p 1 ,v 1 ,o 1 ,t 1 ),...,(p i ,v i ,o i ,t i ),...,(p n ,v n ,o n ,t n )} (1)
wherein p is i =(x i ,y i ) Representing the position coordinates of each frame of a track, v i Indicating the speed, o, of the individual at each time i The speed direction of the individual at each moment is represented, n represents the total point number of a track, and t is the moment of extracting each frame.
Step 2.2: defining individual similarities based on the similarities in distance and speed;
the video data is utilized, and the individual similarity is defined by utilizing the distance similarity, sigmoid function, speed size similarity and speed direction similarity in the clustering behavior. In particular, the method comprises the steps of,
distance similarity the distance similarity dis (i, j) of individuals i, j is calculated herein using euclidean distance,
wherein x is i An abscissa representing the pedestrian position, y i Representing the ordinate of the pedestrian position.
When researching the speed and the direction similarity, the sigmoid function is used in the speed difference amplitude function and the angle difference amplitude function as shown in a formula (3) to ensure that the similarity results are respectively mapped between [0,1] in order to accelerate the training speed of the grouping model and avoid overlarge data difference values in the mapping set.
Speed magnitude similarity the speed magnitude similarity between individuals i, j is represented herein by defining a speed difference amplitude function vel (i, j), with the smaller the result representing the more similar between individuals.
Wherein v is i -v j Representing the difference in velocity between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.
Speed direction similarity is herein represented by defining an angle difference amplitude function ori (i, j) to represent the direction similarity between individuals i, j, with the smaller the result, the more similar the individuals are represented.
Wherein o is i -o j Representing the speed direction difference between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.
Finally, in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:
wherein w is d +w v +w o =1,w d Weights representing distance similarity features, w v Weights, w, representing speed magnitude similarity features o Weights representing speed direction similarity features. The larger the value of s (i, j) is, the greater the inter-individual similarity is.
Step 2.3: grouping the crowd by using a group generation algorithm according to the individual similarity;
wherein the grouping process is as follows:
the group generation algorithm comprises the following steps:
(1) M individuals were selected as cluster centers (beta) 1 ,...β i ,...β m );
(2) Calculating the similarity between the rest individuals in the crowd and m cluster centers by using a formula (6), and classifying the rest individuals in the crowd into clusters with the maximum similarity value;
(4) Respectively calculating the similarity s (O) of all individuals in the cluster and the virtual cluster center thereof according to the formula (6) v ,P i ) Wherein P is i ∈cluster n N is more than or equal to 1 and less than or equal to m, n represents a clustered sequence number, and an element P with the maximum similarity with a virtual cluster center is selected i It is taken as a new cluster core.
(5) Repeating the step (2), the step (3) and the step (4) until m cluster cores of the groups are not changed any more;
(6) The crowd is divided into m groups, the group of individuals i is C (i), wherein C (i) e C, c= { C (1)..c (m) }, C represents the set of groups.
In (4), the cluster centers of the groups are updated for each group, and a virtual group center O is generated according to formulas (7) (8) (9) (10) v 。
Wherein 1.ltoreq.j.ltoreq.agennum, agennum representing the total number of all individuals, num representing the number of individuals in the cluster,represents the abscissa of individual i, +.>Representing the ordinate of individual i, people [ i ]].vel[j]Representing the speed similarity between individuals i and j, people [ i ]].ori[j]Indicating the directional similarity between individuals i and j.
There are two constraints in the group generation process: (1) each group contains at least one individual; (2) Each individual is and is only a member of one of the groups.
Step 2.4: group path information is calculated.
The group gathering path information is calculated as:
E r =(p′ 1 ,...,p′ c ,...,p′ k ) (11)
wherein p is c ' represents the position information of a group at time c. Omega i ,ω j Weighted values for the x and y coordinates, respectively.
The problem of collision between the fitted clustered paths and the obstacle needs to be considered while deriving the clustered paths, and when this occurs, the position of one of the individuals is used to replace the position after the fitting is completed.
Step 3: a path planning method integrating reinforcement learning and data driving.
And 3, providing a fusion data driving and reinforcement learning path planning method. The method mainly comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; and a third step of avoiding collision between individuals by using a relative velocity barrier method.
Step 3.1: grouping the initialized crowd by using a group generation algorithm;
step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s goal The action value function is iteratively updated and learned by a simple value in the Q-Learning algorithm.
State set setting crowd state set S i State S e S i ,
s=(x,y) (13)
When planning a path for a group, we discretize the scene into a grid and represent each location state using the center of the cell, discretizing the crowd state represents a significant reduction in the number of states, where,s= (x, y) represents the position of the pedestrian at each moment, s goal Defined as a target state, and the crowd reaches the target state to stop moving.
Action set defines an action set A for each state i . Action a e A i A is the state to be selected next, wherein the action set A i = { east, west, south, north, southeast, northeast, southwest, northwest }, for the current state s, different states are reached after selecting different directions. We take an action in state s (x, y) to produce a new state s ' = (x ', y ') and use the position of the next grid center point to represent.
The bonus function gives a target state s goal By being in state S e S i Take action a e A i Is:
wherein r is a direct reward value, s, s' respectively represent a current state and a new state, w 1 And w 2 Weights of a distance function and a similarity function, respectively, D is a difference between a distance between a current state position and a target point position and a distance between a next state position and the target point position, r d And r sim A distance function and a similarity function, respectively, the distance function being expressed as:
r d =pathdist(s goal ,s)-pathdist(s goal ,s′) (15)
s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r d A larger value of (c) indicates a closer to the target location, i.e., more rewards are generated. Here we calculate the similarity between the trajectories using cosine similarity, which is to map the individual trajectories to stereo space first, then calculate the cosine value of the angle between the two group vectors to measure the similarity between them, the cosine value of the angle being [ -1,1]The closer to 1, the more similar the two group trajectories are.Herein, the similarity function is expressed as:
a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation process, and r sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r sim The closer to 1, the more similar the trajectory of the pedestrian, i.e., the more rewards are generated.
Training an optimal strategy to give a target state s goal The action value function being learned by a simple iterative update of values in a Q-Learning algorithm, e.g.
Where s, a represents the current state and behavior and s ', a' represents the next state and behavior of s. r (s, a) represents a direct reward value of a under s, Q (s, a) represents a path training estimated value, maxQ (s ', a') represents an optimal path estimated value, alpha is a learning rate, lambda is called an attenuation factor, and the importance degree of future returns relative to current returns is indicated.
In this section, the present application utilizes the Q-Learning algorithm in reinforcement Learning to plan paths for crowd groups, the algorithm steps are as follows:
input: s is S i ,s goal ,A i ,λ,α,E max
And (3) outputting: q (Q)
1: initializing a Q matrix;
2: randomly selecting a state as an initial state S epsilon S i ;
3: selecting an action a E A in the action set i ;
4: calculating an immediate report r (s, a) and a next state s';
5: updating the Q matrix by equation (17) until s=s goal ;
6: if the Q matrix is at the maximum number of iterations E max And no longer changes, Q is output.
Step 3.3: based on the social characteristics in the crowd, the application provides two social groups, a linear group and a leader following group on the basis of the obtained group path. The walking track of each pedestrian is obtained by the positional deviation factor (deltax, deltay).
Assume a sequence of paths for a group { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) The sequence of paths for a pedestrian within a group can be expressed as:
{(x 0 +δ 1 Δx,y 0 +δ 2 Δy),...(x i +δ 1 Δx,y i +δ 2 Δy),...(x n +δ 1 Δx,y n +δ 2 Δy)} (18)
wherein delta 1 ,δ 2 Delta as an influencing factor when the pedestrian group is in a linear group 1 ,δ 2 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 1 ,δ 2 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 )。
Step 3.4: and (3) carrying out collision avoidance among individuals by using an RVO algorithm (Reciprocal Velocity Obstacles, RVO), and carrying out real-time coupling with crowd motion to realize crowd motion simulation.
During the crowd exercise, the important parameters affecting the calculation speed of each individual i are shown in table 1:
TABLE 1 important parameters affecting the calculated speed for each group i
In crowd exercise, AV i Representing a reasonable set of speeds of an individual, any one belonging toAV i V' of (c) satisfy:
||V i pref ||<=V i max (20)
wherein, the balance is i (V i ') is the speed penalty value for group i, tc (V) i ') is the expected collision time of group i with surrounding people, ||V i pref -V i ' is the absolute value of the difference between the desired speed and the candidate speed. The next time speed v of the group t+1 Two conditions of a large collision time with other groups and a minimum error value from a desired speed need to be satisfied.
Step 4: generating realistic crowd animation
The simulation system of the realistic animation is a cross-platform simulation system developed based on the XNA technology. The three-dimensional real-time sense rendering platform mainly comprises MS.NET Framework 4.0 and XNA 4.0, and a scene and a motion path are imported on the platform to generate an animation effect.
Example two
An object of the present embodiment is to provide a computer-readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
Example III
It is an object of the present embodiment to provide a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing when executing the program:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
One or more of the above embodiments have the following technical effects:
the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. The method utilizes a group generation algorithm to model crowd grouping behaviors, combines video data and a reinforcement learning method to obtain a group motion path, and utilizes a position offset factor to obtain an individual path. The method can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity, and provide reference for the establishment of a crowd evacuation scheme.
It will be appreciated by those skilled in the art that the modules or steps of the present application described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, so that they may be stored in storage means and executed by computing means, or they may be fabricated separately as individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated as a single integrated circuit module. The present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.
While the foregoing description of the embodiments of the present application has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the application, but rather, it is intended to cover all modifications or variations which may be resorted to without undue burden to those skilled in the art, having the benefit of the present application.
Claims (4)
1. The crowd evacuation simulation method integrating data driving and reinforcement learning is characterized by comprising the following steps of:
step 1: initializing crowd position and scene information;
step 2: establishing a clustering behavior modeling method based on data driving, which comprises the following steps:
step 2.1: crowd tracking is carried out based on video data, and individual track information is obtained;
step 2.2: defining individual similarities based on the similarities in distance and speed; in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:
wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w d 、w v 、w o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively d +w v +w o =1;
Step 2.3: grouping the people according to the individual similarity;
step 2.4: the group path information is calculated, specifically:
E r =(p′ 1 ,...,p′ c ,...,p′ k ),
wherein p' c Representing the position information of a group at time c,weighting values of an x coordinate and a y coordinate respectively; step 3: a path planning method integrating reinforcement learning and data driving comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; a third section for avoiding collision between individuals by using a relative velocity obstacle method; the method comprises the following steps:
step 3.1: acquiring initialized crowd grouping information;
step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s goal The action value function is learned through a simple value iteration update in the Q-Learning algorithm, which comprises the following steps:
state set: setting crowd state set S i State S e S i Where s= (x, y) represents the position of the pedestrian at each moment, s goal The crowd is defined as a target state, and the crowd stops moving when reaching the target state;
action set: defining an action set A for each state i Action a e A i A is the state to be selected next, wherein the action set A i = { east, west, south, north, southeast, northeast, southwest, northwest }, taking an action in state s (x, y) will yield a new state s ' = (x ', y ') and expressed using the position of the next grid center point;
bonus function: given a target state s goal By being in state S e S i Take action a e A i Is:
wherein r is a direct reward value, s, s' respectively represent a current state and a new state, w 1 And w 2 Weights of a distance function and a similarity function, respectively, D is a difference between a distance between a current state position and a target point position and a distance between a next state position and the target point position, r d And r sim A distance function and a similarity function, respectively, the distance function being expressed as:
r d =pathdist(s goal ,s)-pathdist(s goal ,s′)
s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r d A larger value of (a) indicates a closer to the target location, i.e., more rewards are generated;
the similarity function is expressed as:
a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation processPosition vector, r sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r sim The closer to 1 the value of (c), the more similar the trajectory of the pedestrian, i.e. the more rewards are generated;
training an optimal strategy: given a target state s goal The action value function is learned through a simple value iteration in the Q-Learning algorithm:
wherein s, a represents the current state and behavior, s ', a' represents the next state and behavior of s, r (s, a) represents the direct rewarding value of a taken under s, Q (s, a) represents the path training estimated value, maxQ (s ', a') represents the optimal path estimated value, alpha is the learning rate, lambda is the decay factor, and the importance of future rewards relative to the current rewards is indicated;
the Q-Learning algorithm in reinforcement Learning is utilized to plan paths for crowd groups, and the algorithm comprises the following steps:
input: s is S i ,s goal ,A i ,λ,α,E max
And (3) outputting: q (Q)
Initializing a Q matrix;
randomly selecting a state as an initial state S epsilon S i ;
Selecting an action a E A in the action set i ;
Calculating a direct prize value r (s, a) and a next state s';
updating the Q matrix by equation (1) until s=s goal ;
If the Q matrix is at the maximum number of iterations E max If no change is made, outputting Q;
step 3.3: on the basis of obtaining the group path, two social groups are proposed: a linear group, a leader follower group; the walking track of each pedestrian is obtained through the position deviation factors (deltax, deltay), specifically:
assume one of the groupsPath sequence { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) The sequence of paths for a pedestrian within a group can be expressed as: { (x) 0 +δ 1 Δx,y 0 +δ 2 Δy),...(x i +δ 1 Δx,y i +δ 2 Δy),...(x n +δ 1 Δx,y n +δ 2 Δy)}
Wherein delta 1 ,δ 2 Delta as an influencing factor when the pedestrian group is in a linear group 1 ,δ 2 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 1 ,δ 2 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 );
Step 3.4: the RVO algorithm is utilized to avoid collision among individuals, and real-time coupling is carried out on the collision and crowd movement, so that crowd movement simulation is realized;
step 4: and generating the realistic crowd animation.
2. The crowd evacuation simulation method of fusion data driving and reinforcement learning of claim 1, wherein grouping the crowd according to individual motion similarity comprises:
initializing m individuals as cluster centers, and respectively calculating the individual motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;
for each cluster, generating a virtual cluster center, respectively calculating the individual motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;
iterative updating is performed on the m group cluster cores until no more changes occur.
3. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a fused data driven and reinforcement learning crowd evacuation simulation method according to any of claims 1-2.
4. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fused data driven and reinforcement learning crowd evacuation simulation method of any one of claims 1-2 when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811382707.8A CN109543285B (en) | 2018-11-20 | 2018-11-20 | Crowd evacuation simulation method and system integrating data driving and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811382707.8A CN109543285B (en) | 2018-11-20 | 2018-11-20 | Crowd evacuation simulation method and system integrating data driving and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109543285A CN109543285A (en) | 2019-03-29 |
CN109543285B true CN109543285B (en) | 2023-05-09 |
Family
ID=65848584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811382707.8A Active CN109543285B (en) | 2018-11-20 | 2018-11-20 | Crowd evacuation simulation method and system integrating data driving and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109543285B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109974737B (en) * | 2019-04-11 | 2020-01-31 | 山东师范大学 | Route planning method and system based on combination of safety evacuation signs and reinforcement learning |
CN110956684B (en) * | 2019-11-27 | 2023-07-28 | 山东师范大学 | Crowd movement evacuation simulation method and system based on residual error network |
CN111988744B (en) * | 2020-08-31 | 2022-04-01 | 重庆邮电大学 | Position prediction method based on user moving mode |
CN112348285B (en) * | 2020-11-27 | 2021-08-10 | 中国科学院空天信息创新研究院 | Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning |
CN113177535A (en) * | 2021-05-28 | 2021-07-27 | 视伴科技(北京)有限公司 | Method and device for simulating crowd queue form in event activity |
CN113536597B (en) * | 2021-08-12 | 2024-02-20 | 浙江大学 | Speed-based dynamic crowd simulation method optimized through data driving |
CN115239567B (en) * | 2022-09-19 | 2023-01-06 | 中国汽车技术研究中心有限公司 | Automobile collision dummy model scaling method |
CN118261054A (en) * | 2024-04-09 | 2024-06-28 | 哈尔滨工业大学 | Method for determining steady-state speed of various crowds in real evacuation movement based on deep learning |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468801A (en) * | 2014-09-09 | 2016-04-06 | 中国科学院深圳先进技术研究院 | Simulation method and system for crowd evacuation in public place |
JP5996689B2 (en) * | 2015-02-13 | 2016-09-21 | 株式会社構造計画研究所 | Evacuation simulation apparatus, evacuation simulation method and program |
CN107403049B (en) * | 2017-07-31 | 2019-03-19 | 山东师范大学 | A kind of Q-Learning pedestrian's evacuation emulation method and system based on artificial neural network |
CN107464021B (en) * | 2017-08-07 | 2019-07-23 | 山东师范大学 | A kind of crowd evacuation emulation method based on intensified learning, device |
CN107463751B (en) * | 2017-08-10 | 2021-01-08 | 山东师范大学 | Crowd grouping evacuation simulation method and system based on binary DBSCAN clustering algorithm |
CN108446469B (en) * | 2018-03-07 | 2022-02-08 | 山东师范大学 | Video-driven group behavior evacuation simulation method and device |
CN108491598B (en) * | 2018-03-09 | 2022-04-01 | 山东师范大学 | Crowd evacuation simulation method and system based on path planning |
CN108491972A (en) * | 2018-03-21 | 2018-09-04 | 山东师范大学 | A kind of crowd evacuation emulation method and device based on Sarsa algorithms |
-
2018
- 2018-11-20 CN CN201811382707.8A patent/CN109543285B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109543285A (en) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543285B (en) | Crowd evacuation simulation method and system integrating data driving and reinforcement learning | |
Yao et al. | Data-driven crowd evacuation: A reinforcement learning method | |
Liang et al. | Simaug: Learning robust representations from simulation for trajectory prediction | |
Cartillier et al. | Semantic mapnet: Building allocentric semantic maps and representations from egocentric views | |
Chen et al. | Stabilization approaches for reinforcement learning-based end-to-end autonomous driving | |
CN108491598B (en) | Crowd evacuation simulation method and system based on path planning | |
JP5905481B2 (en) | Determination method and determination apparatus | |
CN111461437B (en) | Data-driven crowd motion simulation method based on generation of countermeasure network | |
KR102117007B1 (en) | Method and apparatus for recognizing object on image | |
CN112106060A (en) | Control strategy determination method and system | |
CN103942369B (en) | Intelligent target occurrence method oriented at near space | |
Wong et al. | Testing the safety of self-driving vehicles by simulating perception and prediction | |
CN110956684B (en) | Crowd movement evacuation simulation method and system based on residual error network | |
CN105069829B (en) | A kind of human body animation generation method based on more visually frequencies | |
Zhang et al. | Crowd evacuation simulation using hierarchical deep reinforcement learning | |
CN116679711A (en) | Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning | |
Wang et al. | An immersive multi-agent system for interactive applications | |
Bisagno et al. | Data-driven crowd simulation | |
CN114548497B (en) | Crowd motion path planning method and system for realizing scene self-adaption | |
Bera et al. | Modeling trajectory-level behaviors using time varying pedestrian movement dynamics | |
Yao et al. | Crowd Simulation with Detailed Body Motion and Interaction | |
Liu et al. | A monocular visual body enhancement algorithm for recreating simulation training games for sports students on the field | |
Luo et al. | Modeling gap seeking behaviors for agent-based crowd simulation | |
Janapalli et al. | Heterogeneous crowd simulation | |
Wang et al. | Capturing human movements for simulation environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231129 Address after: No. 1823, Building A2-5, Hanyu Jingu, No. 7000 Jingshi East Road, High tech Zone, Jinan City, Shandong Province, 250000 Patentee after: Shandong Data Trading Co.,Ltd. Address before: 250014 No. 88, Wenhua East Road, Lixia District, Shandong, Ji'nan Patentee before: SHANDONG NORMAL University |
|
TR01 | Transfer of patent right |