CN109543285B - Crowd evacuation simulation method and system integrating data driving and reinforcement learning - Google Patents

Crowd evacuation simulation method and system integrating data driving and reinforcement learning Download PDF

Info

Publication number
CN109543285B
CN109543285B CN201811382707.8A CN201811382707A CN109543285B CN 109543285 B CN109543285 B CN 109543285B CN 201811382707 A CN201811382707 A CN 201811382707A CN 109543285 B CN109543285 B CN 109543285B
Authority
CN
China
Prior art keywords
crowd
state
group
similarity
individuals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811382707.8A
Other languages
Chinese (zh)
Other versions
CN109543285A (en
Inventor
张桂娟
姚珍珍
陆佃杰
刘弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Data Trading Co ltd
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201811382707.8A priority Critical patent/CN109543285B/en
Publication of CN109543285A publication Critical patent/CN109543285A/en
Application granted granted Critical
Publication of CN109543285B publication Critical patent/CN109543285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a crowd evacuation simulation method and a system for integrating data driving and reinforcement learning, wherein the method comprises the following steps: acquiring real video data, and carrying out crowd tracking according to the video data; group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained; initializing scene information and crowd positions, and dividing the crowd into groups; according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene; generating paths of individuals in the groups according to the path information of each group; and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation. The invention can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity and provide reference for the establishment of crowd evacuation schemes.

Description

Crowd evacuation simulation method and system integrating data driving and reinforcement learning
Technical Field
The disclosure belongs to the field of crowd evacuation computer simulation, and particularly relates to a crowd evacuation simulation method integrating data driving and reinforcement learning.
Background
Group modeling and simulation are research fields which are attracting more and more attention from industry, academia and government departments in recent years, with the rapid development of social economy in China, the living standard of people is continuously improved, and people go out more frequently, especially in public places with dense crowd distribution, such as railway stations, tourist attractions, shopping squares and the like, the flow of people is very large in a short time, fine disturbance in the crowd can greatly influence the crowd evacuation efficiency, the potential safety hazard is extremely large, and crowd crowding trampling events are very easy to cause if people cannot be effectively controlled.
Therefore, the evacuation situation of the real crowd in the occurrence of the crisis event is simulated, the potential crowd congestion trampling risk can be avoided in advance, and the method has very important research value. The traditional crowd evacuation simulation method is mainly used for improving the evacuation efficiency of the crowd in a specific scene, but a plurality of assumed conditions reduce the reality of the crowd movement; the existing data driving method improves the reality of crowd simulation, but has great dependence on data quantity, and most of the data driving method cannot adapt to dynamic scenes.
Disclosure of Invention
In order to overcome the defects in the prior art, the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. According to the method, firstly, the real data of the video are extracted, modeling is conducted on the crowd behaviors, secondly, a reinforcement learning method is fused, the limited real data are utilized, the crowd movement in different scenes is achieved, the crowd movement behaviors are simulated more truly, and a reference is provided for the establishment of a crowd evacuation scheme.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
a crowd evacuation simulation method integrating data driving and reinforcement learning comprises the following steps:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
Further, grouping the population according to the motion similarity includes:
initializing m individuals as cluster centers, and respectively calculating the motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;
for each cluster, generating a virtual cluster center, respectively calculating the motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;
iterative updating is performed on the m group cluster cores until no more changes occur.
Further, path information E of the group r The calculation method comprises the following steps:
E r =(p′ 1 ,...,p′ c ,...,p′ k )
Figure GDA0001898885920000021
wherein p is c ' represents the position information of a group at time c, ω ij The weighted values of the x coordinate and the y coordinate of the ith individual in the group are respectively equal to or greater than 1 and equal to or less than n, and n is the number of individuals in the group.
Further, the motion similarity is expressed as:
Figure GDA0001898885920000022
wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w d 、w v And w o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively d +w v +w o =1。
Further, the path planning of the group in the implementation scene includes:
discretizing the scene into a grid and representing each position state using the center of the cell; defining an action set A for each state i Action a e A i A is the state to be selected next;
constructing a reward function according to the position relation between the action and the target state and the similarity between the group and the track in the video and the scene;
and performing path planning on each group in the scene based on the Q-learning method.
Further, performing path planning on each group in the scene based on the Q-learning method includes:
given a target state s goal Initializing a Q matrix;
randomly selecting a state as an initial state S epsilon S i Selecting an action a epsilon A in the action set i
Calculating the immediate report r (s, a) and the next state s', updating the Q matrix by the following formula until s=s goal
Figure GDA0001898885920000031
Where α is the learning rate and s' is the new state after taking action in state s;
if the Q matrix is at the maximum number of iterations E max No longer change, output
Figure GDA0001898885920000034
Further, the reward function is as follows:
Figure GDA0001898885920000032
wherein w is 1 And w 2 Respectively distance function r d Similarity functionr sim D is the difference between the distance of the current state position from the target point position and the distance of the next state position from the target point position;
distance function: r is (r) d =pathdist(s goal ,s)-pathdist(s goal ,s′)
s is the current state, s' is the state after an action is performed, pathdist (s goal S) represents the distance from state s to the target state;
similarity function:
Figure GDA0001898885920000033
mapping the group track in the real video to a stereoscopic space where the scene is located, wherein a represents a position vector from a current state to a next state in the video, and b represents a position vector from the current state to the next state in the simulation process.
Further, the method for generating the path of the individual in the group according to the path information of each group is as follows:
assume that the path sequence of a group is { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) A pedestrian's path sequence within the group is: { (x) 01 Δx,y 02 Δy),...(x i1 Δx,y i2 Δy),...(x n1 Δx,y n2 Δy)};
Wherein delta 12 Delta as an influencing factor when the pedestrian group is in a linear group 12 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 12 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 )。
Further, the inter-individual collision detection employs RVO technology.
One or more embodiments provide a computer readable storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the crowd evacuation simulation method of fused data driving and reinforcement learning.
One or more embodiments provide a computer system including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the crowd evacuation simulation method of fusion data driving and reinforcement learning when executing the program. .
The one or more of the above technical solutions have the following beneficial effects:
the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. The method utilizes a group generation algorithm to model crowd grouping behaviors, combines video data and a reinforcement learning method to obtain a group motion path, and utilizes a position offset factor to obtain an individual path. The method can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity, and provide reference for the establishment of a crowd evacuation scheme.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart of a crowd evacuation simulation method integrating data driving and reinforcement learning according to an embodiment of the disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments and features of embodiments in this application may be combined with each other without conflict.
As is well known, the reinforcement learning (Reinforcement learning) approach possesses powerful control strategies that can learn to mimic a wide range of example actions while accommodating changes in morphology and accomplishing user-specified goals. Therefore, the fusion of the data driving technology and the reinforcement learning method has important significance for simulating the real crowd by using limited data. Firstly, tracking crowd in a real video, modeling crowd grouping behaviors through distance and speed characteristics, fitting a track, obtaining group path information, and performing data guidance for next-stage work so as to improve the calculation efficiency of the path and the authenticity of a simulation effect; secondly, establishing a path planning method integrating data driving and reinforcement Learning, providing a double-layer relation mechanism, and combining group path information in a video with a Q-Learning algorithm training optimal strategy in reinforcement Learning by an upper layer to realize group path planning; the lower layer acquires individual paths through a position deviation factor based on social characteristics and performs collision avoidance by combining a relative speed obstacle method (Reciprocal Velocity Obstacles, RVO); and finally, generating crowd animation by using a sense of reality rendering method.
Example 1
The embodiment discloses a crowd evacuation simulation method integrating data driving and reinforcement learning, as shown in fig. 1, comprising the following steps:
step 1: initializing crowd position and scene information;
the initialization of the individual position in the step 1 is to randomly initialize the individual within the range of the scene and avoid all the obstacles by comprehensively considering scene information and obstacle information.
Step 2: establishing a clustering behavior modeling method based on data driving;
the clustering behavior modeling method based on data driving in the step 2 comprises four parts: video capture and data extraction, definition of grouping behavior based on motion features, group generation algorithms and grouping path information computation.
Step 2.1: crowd tracking is carried out based on video data, and individual track information is obtained;
and shooting the crowd by using a small ant intelligent camera for real-time flow recording.
And for a real video, tracking the crowd in the video by using a TLD (Tracking-Learning-Detection) target Tracking algorithm to obtain individual track information in the crowd. The TLD tracking algorithm integrates a tracker, a detector and a learning module, so that continuous moving targets can be tracked, shielding reproduction targets can be detected and tracked again, and shielding resistance is excellent.
In tracking crowd video, it is first necessary to identify environmental features (such as walls and obstructions) and then track the individual's two-dimensional trajectory, if this trajectory is unsatisfactory, then browse the video frames and specify the location of the target in the middle video frame, and then apply bi-directional tracking to two separate time intervals (the start time and the middle time interval). In this way, the user can adaptively modify the trajectory until a satisfactory result is obtained. After the tracking is completed, the required trace information E of each person is obtained, wherein each trace can be expressed as:
E={(p 1 ,v 1 ,o 1 ,t 1 ),...,(p i ,v i ,o i ,t i ),...,(p n ,v n ,o n ,t n )} (1)
wherein p is i =(x i ,y i ) Representing the position coordinates of each frame of a track, v i Indicating the speed, o, of the individual at each time i The speed direction of the individual at each moment is represented, n represents the total point number of a track, and t is the moment of extracting each frame.
Step 2.2: defining individual similarities based on the similarities in distance and speed;
the video data is utilized, and the individual similarity is defined by utilizing the distance similarity, sigmoid function, speed size similarity and speed direction similarity in the clustering behavior. In particular, the method comprises the steps of,
distance similarity the distance similarity dis (i, j) of individuals i, j is calculated herein using euclidean distance,
Figure GDA0001898885920000051
wherein x is i An abscissa representing the pedestrian position, y i Representing the ordinate of the pedestrian position.
When researching the speed and the direction similarity, the sigmoid function is used in the speed difference amplitude function and the angle difference amplitude function as shown in a formula (3) to ensure that the similarity results are respectively mapped between [0,1] in order to accelerate the training speed of the grouping model and avoid overlarge data difference values in the mapping set.
Figure GDA0001898885920000061
Speed magnitude similarity the speed magnitude similarity between individuals i, j is represented herein by defining a speed difference amplitude function vel (i, j), with the smaller the result representing the more similar between individuals.
Figure GDA0001898885920000062
Wherein v is i -v j Representing the difference in velocity between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.
Speed direction similarity is herein represented by defining an angle difference amplitude function ori (i, j) to represent the direction similarity between individuals i, j, with the smaller the result, the more similar the individuals are represented.
Figure GDA0001898885920000063
Wherein o is i -o j Representing the speed direction difference between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.
Finally, in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:
Figure GDA0001898885920000064
wherein w is d +w v +w o =1,w d Weights representing distance similarity features, w v Weights, w, representing speed magnitude similarity features o Weights representing speed direction similarity features. The larger the value of s (i, j) is, the greater the inter-individual similarity is.
Step 2.3: grouping the crowd by using a group generation algorithm according to the individual similarity;
wherein the grouping process is as follows:
the group generation algorithm comprises the following steps:
(1) M individuals were selected as cluster centers (beta) 1 ,...β i ,...β m );
(2) Calculating the similarity between the rest individuals in the crowd and m cluster centers by using a formula (6), and classifying the rest individuals in the crowd into clusters with the maximum similarity value;
(3) For each cluster, a virtual cluster center is generated using equations (7) (8) (9) (10)
Figure GDA0001898885920000071
(4) Respectively calculating the similarity s (O) of all individuals in the cluster and the virtual cluster center thereof according to the formula (6) v ,P i ) Wherein P is i ∈cluster n N is more than or equal to 1 and less than or equal to m, n represents a clustered sequence number, and an element P with the maximum similarity with a virtual cluster center is selected i It is taken as a new cluster core.
(5) Repeating the step (2), the step (3) and the step (4) until m cluster cores of the groups are not changed any more;
(6) The crowd is divided into m groups, the group of individuals i is C (i), wherein C (i) e C, c= { C (1)..c (m) }, C represents the set of groups.
In (4), the cluster centers of the groups are updated for each group, and a virtual group center O is generated according to formulas (7) (8) (9) (10) v
Figure GDA0001898885920000072
Figure GDA0001898885920000073
Figure GDA0001898885920000074
Figure GDA0001898885920000075
Wherein 1.ltoreq.j.ltoreq.agennum, agennum representing the total number of all individuals, num representing the number of individuals in the cluster,
Figure GDA0001898885920000076
represents the abscissa of individual i, +.>
Figure GDA0001898885920000077
Representing the ordinate of individual i, people [ i ]].vel[j]Representing the speed similarity between individuals i and j, people [ i ]].ori[j]Indicating the directional similarity between individuals i and j.
There are two constraints in the group generation process: (1) each group contains at least one individual; (2) Each individual is and is only a member of one of the groups.
Step 2.4: group path information is calculated.
The group gathering path information is calculated as:
E r =(p′ 1 ,...,p′ c ,...,p′ k ) (11)
Figure GDA0001898885920000078
wherein p is c ' represents the position information of a group at time c. Omega ij Weighted values for the x and y coordinates, respectively.
The problem of collision between the fitted clustered paths and the obstacle needs to be considered while deriving the clustered paths, and when this occurs, the position of one of the individuals is used to replace the position after the fitting is completed.
Step 3: a path planning method integrating reinforcement learning and data driving.
And 3, providing a fusion data driving and reinforcement learning path planning method. The method mainly comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; and a third step of avoiding collision between individuals by using a relative velocity barrier method.
Step 3.1: grouping the initialized crowd by using a group generation algorithm;
step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s goal The action value function is iteratively updated and learned by a simple value in the Q-Learning algorithm.
State set setting crowd state set S i State S e S i ,
s=(x,y) (13)
When planning a path for a group, we discretize the scene into a grid and represent each location state using the center of the cell, discretizing the crowd state represents a significant reduction in the number of states, where,s= (x, y) represents the position of the pedestrian at each moment, s goal Defined as a target state, and the crowd reaches the target state to stop moving.
Action set defines an action set A for each state i . Action a e A i A is the state to be selected next, wherein the action set A i = { east, west, south, north, southeast, northeast, southwest, northwest }, for the current state s, different states are reached after selecting different directions. We take an action in state s (x, y) to produce a new state s ' = (x ', y ') and use the position of the next grid center point to represent.
The bonus function gives a target state s goal By being in state S e S i Take action a e A i Is:
Figure GDA0001898885920000081
wherein r is a direct reward value, s, s' respectively represent a current state and a new state, w 1 And w 2 Weights of a distance function and a similarity function, respectively, D is a difference between a distance between a current state position and a target point position and a distance between a next state position and the target point position, r d And r sim A distance function and a similarity function, respectively, the distance function being expressed as:
r d =pathdist(s goal ,s)-pathdist(s goal ,s′) (15)
s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r d A larger value of (c) indicates a closer to the target location, i.e., more rewards are generated. Here we calculate the similarity between the trajectories using cosine similarity, which is to map the individual trajectories to stereo space first, then calculate the cosine value of the angle between the two group vectors to measure the similarity between them, the cosine value of the angle being [ -1,1]The closer to 1, the more similar the two group trajectories are.Herein, the similarity function is expressed as:
Figure GDA0001898885920000091
a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation process, and r sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r sim The closer to 1, the more similar the trajectory of the pedestrian, i.e., the more rewards are generated.
Training an optimal strategy to give a target state s goal The action value function being learned by a simple iterative update of values in a Q-Learning algorithm, e.g.
Figure GDA0001898885920000092
Where s, a represents the current state and behavior and s ', a' represents the next state and behavior of s. r (s, a) represents a direct reward value of a under s, Q (s, a) represents a path training estimated value, maxQ (s ', a') represents an optimal path estimated value, alpha is a learning rate, lambda is called an attenuation factor, and the importance degree of future returns relative to current returns is indicated.
In this section, the present application utilizes the Q-Learning algorithm in reinforcement Learning to plan paths for crowd groups, the algorithm steps are as follows:
input: s is S i ,s goal ,A i ,λ,α,E max
And (3) outputting: q (Q)
1: initializing a Q matrix;
2: randomly selecting a state as an initial state S epsilon S i
3: selecting an action a E A in the action set i
4: calculating an immediate report r (s, a) and a next state s';
5: updating the Q matrix by equation (17) until s=s goal
6: if the Q matrix is at the maximum number of iterations E max And no longer changes, Q is output.
Step 3.3: based on the social characteristics in the crowd, the application provides two social groups, a linear group and a leader following group on the basis of the obtained group path. The walking track of each pedestrian is obtained by the positional deviation factor (deltax, deltay).
Assume a sequence of paths for a group { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) The sequence of paths for a pedestrian within a group can be expressed as:
{(x 01 Δx,y 02 Δy),...(x i1 Δx,y i2 Δy),...(x n1 Δx,y n2 Δy)} (18)
wherein delta 12 Delta as an influencing factor when the pedestrian group is in a linear group 12 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 12 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 )。
Step 3.4: and (3) carrying out collision avoidance among individuals by using an RVO algorithm (Reciprocal Velocity Obstacles, RVO), and carrying out real-time coupling with crowd motion to realize crowd motion simulation.
During the crowd exercise, the important parameters affecting the calculation speed of each individual i are shown in table 1:
TABLE 1 important parameters affecting the calculated speed for each group i
Figure GDA0001898885920000101
In crowd exercise, AV i Representing a reasonable set of speeds of an individual, any one belonging toAV i V' of (c) satisfy:
Figure GDA0001898885920000102
||V i pref ||<=V i max (20)
Figure GDA0001898885920000103
Figure GDA0001898885920000104
wherein, the balance is i (V i ') is the speed penalty value for group i, tc (V) i ') is the expected collision time of group i with surrounding people, ||V i pref -V i ' is the absolute value of the difference between the desired speed and the candidate speed. The next time speed v of the group t+1 Two conditions of a large collision time with other groups and a minimum error value from a desired speed need to be satisfied.
Step 4: generating realistic crowd animation
The simulation system of the realistic animation is a cross-platform simulation system developed based on the XNA technology. The three-dimensional real-time sense rendering platform mainly comprises MS.NET Framework 4.0 and XNA 4.0, and a scene and a motion path are imported on the platform to generate an animation effect.
Example two
An object of the present embodiment is to provide a computer-readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
Example III
It is an object of the present embodiment to provide a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing when executing the program:
acquiring real video data, and carrying out crowd tracking according to the video data;
group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;
initializing scene information and crowd positions, and dividing the crowd into groups;
according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;
generating paths of individuals in the groups according to the path information of each group;
and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.
One or more of the above embodiments have the following technical effects:
the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. The method utilizes a group generation algorithm to model crowd grouping behaviors, combines video data and a reinforcement learning method to obtain a group motion path, and utilizes a position offset factor to obtain an individual path. The method can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity, and provide reference for the establishment of a crowd evacuation scheme.
It will be appreciated by those skilled in the art that the modules or steps of the present application described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, so that they may be stored in storage means and executed by computing means, or they may be fabricated separately as individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated as a single integrated circuit module. The present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.
While the foregoing description of the embodiments of the present application has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the application, but rather, it is intended to cover all modifications or variations which may be resorted to without undue burden to those skilled in the art, having the benefit of the present application.

Claims (4)

1. The crowd evacuation simulation method integrating data driving and reinforcement learning is characterized by comprising the following steps of:
step 1: initializing crowd position and scene information;
step 2: establishing a clustering behavior modeling method based on data driving, which comprises the following steps:
step 2.1: crowd tracking is carried out based on video data, and individual track information is obtained;
step 2.2: defining individual similarities based on the similarities in distance and speed; in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:
Figure FDA0004168073960000011
wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w d 、w v 、w o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively d +w v +w o =1;
Step 2.3: grouping the people according to the individual similarity;
step 2.4: the group path information is calculated, specifically:
E r =(p′ 1 ,...,p′ c ,...,p′ k ),
Figure FDA0004168073960000012
wherein p' c Representing the position information of a group at time c,
Figure FDA0004168073960000013
weighting values of an x coordinate and a y coordinate respectively; step 3: a path planning method integrating reinforcement learning and data driving comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; a third section for avoiding collision between individuals by using a relative velocity obstacle method; the method comprises the following steps:
step 3.1: acquiring initialized crowd grouping information;
step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s goal The action value function is learned through a simple value iteration update in the Q-Learning algorithm, which comprises the following steps:
state set: setting crowd state set S i State S e S i Where s= (x, y) represents the position of the pedestrian at each moment, s goal The crowd is defined as a target state, and the crowd stops moving when reaching the target state;
action set: defining an action set A for each state i Action a e A i A is the state to be selected next, wherein the action set A i = { east, west, south, north, southeast, northeast, southwest, northwest }, taking an action in state s (x, y) will yield a new state s ' = (x ', y ') and expressed using the position of the next grid center point;
bonus function: given a target state s goal By being in state S e S i Take action a e A i Is:
Figure FDA0004168073960000014
wherein r is a direct reward value, s, s' respectively represent a current state and a new state, w 1 And w 2 Weights of a distance function and a similarity function, respectively, D is a difference between a distance between a current state position and a target point position and a distance between a next state position and the target point position, r d And r sim A distance function and a similarity function, respectively, the distance function being expressed as:
r d =pathdist(s goal ,s)-pathdist(s goal ,s′)
s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r d A larger value of (a) indicates a closer to the target location, i.e., more rewards are generated;
the similarity function is expressed as:
Figure FDA0004168073960000015
a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation processPosition vector, r sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r sim The closer to 1 the value of (c), the more similar the trajectory of the pedestrian, i.e. the more rewards are generated;
training an optimal strategy: given a target state s goal The action value function is learned through a simple value iteration in the Q-Learning algorithm:
Figure FDA0004168073960000021
wherein s, a represents the current state and behavior, s ', a' represents the next state and behavior of s, r (s, a) represents the direct rewarding value of a taken under s, Q (s, a) represents the path training estimated value, maxQ (s ', a') represents the optimal path estimated value, alpha is the learning rate, lambda is the decay factor, and the importance of future rewards relative to the current rewards is indicated;
the Q-Learning algorithm in reinforcement Learning is utilized to plan paths for crowd groups, and the algorithm comprises the following steps:
input: s is S i ,s goal ,A i ,λ,α,E max
And (3) outputting: q (Q)
Initializing a Q matrix;
randomly selecting a state as an initial state S epsilon S i
Selecting an action a E A in the action set i
Calculating a direct prize value r (s, a) and a next state s';
updating the Q matrix by equation (1) until s=s goal
If the Q matrix is at the maximum number of iterations E max If no change is made, outputting Q;
step 3.3: on the basis of obtaining the group path, two social groups are proposed: a linear group, a leader follower group; the walking track of each pedestrian is obtained through the position deviation factors (deltax, deltay), specifically:
assume one of the groupsPath sequence { (x) 0 ,y 0 ),...(x i ,y i ),...(x n ,y n ) The sequence of paths for a pedestrian within a group can be expressed as: { (x) 01 Δx,y 02 Δy),...(x i1 Δx,y i2 Δy),...(x n1 Δx,y n2 Δy)}
Wherein delta 12 Delta as an influencing factor when the pedestrian group is in a linear group 12 There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group 12 At [ -1,1]The values are taken and accord with normal distribution: delta 1 ~N(μ,σ 2 ),δ 2 ~N(μ,σ 2 );
Step 3.4: the RVO algorithm is utilized to avoid collision among individuals, and real-time coupling is carried out on the collision and crowd movement, so that crowd movement simulation is realized;
step 4: and generating the realistic crowd animation.
2. The crowd evacuation simulation method of fusion data driving and reinforcement learning of claim 1, wherein grouping the crowd according to individual motion similarity comprises:
initializing m individuals as cluster centers, and respectively calculating the individual motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;
for each cluster, generating a virtual cluster center, respectively calculating the individual motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;
iterative updating is performed on the m group cluster cores until no more changes occur.
3. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a fused data driven and reinforcement learning crowd evacuation simulation method according to any of claims 1-2.
4. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fused data driven and reinforcement learning crowd evacuation simulation method of any one of claims 1-2 when the program is executed by the processor.
CN201811382707.8A 2018-11-20 2018-11-20 Crowd evacuation simulation method and system integrating data driving and reinforcement learning Active CN109543285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811382707.8A CN109543285B (en) 2018-11-20 2018-11-20 Crowd evacuation simulation method and system integrating data driving and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811382707.8A CN109543285B (en) 2018-11-20 2018-11-20 Crowd evacuation simulation method and system integrating data driving and reinforcement learning

Publications (2)

Publication Number Publication Date
CN109543285A CN109543285A (en) 2019-03-29
CN109543285B true CN109543285B (en) 2023-05-09

Family

ID=65848584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811382707.8A Active CN109543285B (en) 2018-11-20 2018-11-20 Crowd evacuation simulation method and system integrating data driving and reinforcement learning

Country Status (1)

Country Link
CN (1) CN109543285B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109974737B (en) * 2019-04-11 2020-01-31 山东师范大学 Route planning method and system based on combination of safety evacuation signs and reinforcement learning
CN110956684B (en) * 2019-11-27 2023-07-28 山东师范大学 Crowd movement evacuation simulation method and system based on residual error network
CN111988744B (en) * 2020-08-31 2022-04-01 重庆邮电大学 Position prediction method based on user moving mode
CN112348285B (en) * 2020-11-27 2021-08-10 中国科学院空天信息创新研究院 Crowd evacuation simulation method in dynamic environment based on deep reinforcement learning
CN113177535A (en) * 2021-05-28 2021-07-27 视伴科技(北京)有限公司 Method and device for simulating crowd queue form in event activity
CN113536597B (en) * 2021-08-12 2024-02-20 浙江大学 Speed-based dynamic crowd simulation method optimized through data driving
CN115239567B (en) * 2022-09-19 2023-01-06 中国汽车技术研究中心有限公司 Automobile collision dummy model scaling method
CN118261054A (en) * 2024-04-09 2024-06-28 哈尔滨工业大学 Method for determining steady-state speed of various crowds in real evacuation movement based on deep learning

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468801A (en) * 2014-09-09 2016-04-06 中国科学院深圳先进技术研究院 Simulation method and system for crowd evacuation in public place
JP5996689B2 (en) * 2015-02-13 2016-09-21 株式会社構造計画研究所 Evacuation simulation apparatus, evacuation simulation method and program
CN107403049B (en) * 2017-07-31 2019-03-19 山东师范大学 A kind of Q-Learning pedestrian's evacuation emulation method and system based on artificial neural network
CN107464021B (en) * 2017-08-07 2019-07-23 山东师范大学 A kind of crowd evacuation emulation method based on intensified learning, device
CN107463751B (en) * 2017-08-10 2021-01-08 山东师范大学 Crowd grouping evacuation simulation method and system based on binary DBSCAN clustering algorithm
CN108446469B (en) * 2018-03-07 2022-02-08 山东师范大学 Video-driven group behavior evacuation simulation method and device
CN108491598B (en) * 2018-03-09 2022-04-01 山东师范大学 Crowd evacuation simulation method and system based on path planning
CN108491972A (en) * 2018-03-21 2018-09-04 山东师范大学 A kind of crowd evacuation emulation method and device based on Sarsa algorithms

Also Published As

Publication number Publication date
CN109543285A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109543285B (en) Crowd evacuation simulation method and system integrating data driving and reinforcement learning
Yao et al. Data-driven crowd evacuation: A reinforcement learning method
Liang et al. Simaug: Learning robust representations from simulation for trajectory prediction
Cartillier et al. Semantic mapnet: Building allocentric semantic maps and representations from egocentric views
Chen et al. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving
CN108491598B (en) Crowd evacuation simulation method and system based on path planning
JP5905481B2 (en) Determination method and determination apparatus
CN111461437B (en) Data-driven crowd motion simulation method based on generation of countermeasure network
KR102117007B1 (en) Method and apparatus for recognizing object on image
CN112106060A (en) Control strategy determination method and system
CN103942369B (en) Intelligent target occurrence method oriented at near space
Wong et al. Testing the safety of self-driving vehicles by simulating perception and prediction
CN110956684B (en) Crowd movement evacuation simulation method and system based on residual error network
CN105069829B (en) A kind of human body animation generation method based on more visually frequencies
Zhang et al. Crowd evacuation simulation using hierarchical deep reinforcement learning
CN116679711A (en) Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning
Wang et al. An immersive multi-agent system for interactive applications
Bisagno et al. Data-driven crowd simulation
CN114548497B (en) Crowd motion path planning method and system for realizing scene self-adaption
Bera et al. Modeling trajectory-level behaviors using time varying pedestrian movement dynamics
Yao et al. Crowd Simulation with Detailed Body Motion and Interaction
Liu et al. A monocular visual body enhancement algorithm for recreating simulation training games for sports students on the field
Luo et al. Modeling gap seeking behaviors for agent-based crowd simulation
Janapalli et al. Heterogeneous crowd simulation
Wang et al. Capturing human movements for simulation environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231129

Address after: No. 1823, Building A2-5, Hanyu Jingu, No. 7000 Jingshi East Road, High tech Zone, Jinan City, Shandong Province, 250000

Patentee after: Shandong Data Trading Co.,Ltd.

Address before: 250014 No. 88, Wenhua East Road, Lixia District, Shandong, Ji'nan

Patentee before: SHANDONG NORMAL University

TR01 Transfer of patent right