CN109543285B

CN109543285B - Crowd evacuation simulation method and system integrating data driving and reinforcement learning

Info

Publication number: CN109543285B
Application number: CN201811382707.8A
Authority: CN
Inventors: 张桂娟; 姚珍珍; 陆佃杰; 刘弘
Original assignee: Shandong Normal University
Current assignee: Shandong Data Trading Co ltd
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2023-05-09
Anticipated expiration: 2038-11-20
Also published as: CN109543285A

Abstract

The invention discloses a crowd evacuation simulation method and a system for integrating data driving and reinforcement learning, wherein the method comprises the following steps: acquiring real video data, and carrying out crowd tracking according to the video data; group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained; initializing scene information and crowd positions, and dividing the crowd into groups; according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene; generating paths of individuals in the groups according to the path information of each group; and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation. The invention can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity and provide reference for the establishment of crowd evacuation schemes.

Description

Crowd evacuation simulation method and system integrating data driving and reinforcement learning

Technical Field

The disclosure belongs to the field of crowd evacuation computer simulation, and particularly relates to a crowd evacuation simulation method integrating data driving and reinforcement learning.

Background

Group modeling and simulation are research fields which are attracting more and more attention from industry, academia and government departments in recent years, with the rapid development of social economy in China, the living standard of people is continuously improved, and people go out more frequently, especially in public places with dense crowd distribution, such as railway stations, tourist attractions, shopping squares and the like, the flow of people is very large in a short time, fine disturbance in the crowd can greatly influence the crowd evacuation efficiency, the potential safety hazard is extremely large, and crowd crowding trampling events are very easy to cause if people cannot be effectively controlled.

Therefore, the evacuation situation of the real crowd in the occurrence of the crisis event is simulated, the potential crowd congestion trampling risk can be avoided in advance, and the method has very important research value. The traditional crowd evacuation simulation method is mainly used for improving the evacuation efficiency of the crowd in a specific scene, but a plurality of assumed conditions reduce the reality of the crowd movement; the existing data driving method improves the reality of crowd simulation, but has great dependence on data quantity, and most of the data driving method cannot adapt to dynamic scenes.

Disclosure of Invention

In order to overcome the defects in the prior art, the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. According to the method, firstly, the real data of the video are extracted, modeling is conducted on the crowd behaviors, secondly, a reinforcement learning method is fused, the limited real data are utilized, the crowd movement in different scenes is achieved, the crowd movement behaviors are simulated more truly, and a reference is provided for the establishment of a crowd evacuation scheme.

To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

a crowd evacuation simulation method integrating data driving and reinforcement learning comprises the following steps:

acquiring real video data, and carrying out crowd tracking according to the video data;

group division is carried out on the crowd according to the motion similarity, and path information of each group is obtained;

initializing scene information and crowd positions, and dividing the crowd into groups;

according to the path information of the group in the video, combining reinforcement learning training optimal strategy to realize path planning of the group in the scene;

generating paths of individuals in the groups according to the path information of each group;

and performing collision detection among individuals, and performing real-time coupling with crowd motion in video data to realize crowd motion simulation.

Further, grouping the population according to the motion similarity includes:

initializing m individuals as cluster centers, and respectively calculating the motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;

for each cluster, generating a virtual cluster center, respectively calculating the motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;

iterative updating is performed on the m group cluster cores until no more changes occur.

Further, path information E of the group _r The calculation method comprises the following steps:

E _r ＝(p′ ₁ ,...,p′ _c ,...,p′ _k )

wherein p is _c ' represents the position information of a group at time c, ω _i ,ω _j The weighted values of the x coordinate and the y coordinate of the ith individual in the group are respectively equal to or greater than 1 and equal to or less than n, and n is the number of individuals in the group.

Further, the motion similarity is expressed as:

wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w _d 、w _v And w _o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively _d +w _v +w _o ＝1。

Further, the path planning of the group in the implementation scene includes:

discretizing the scene into a grid and representing each position state using the center of the cell; defining an action set A for each state _i Action a e A _i A is the state to be selected next;

constructing a reward function according to the position relation between the action and the target state and the similarity between the group and the track in the video and the scene;

and performing path planning on each group in the scene based on the Q-learning method.

Further, performing path planning on each group in the scene based on the Q-learning method includes:

given a target state s _goal Initializing a Q matrix;

randomly selecting a state as an initial state S epsilon S _i Selecting an action a epsilon A in the action set _i ；

Calculating the immediate report r (s, a) and the next state s', updating the Q matrix by the following formula until s=s _goal ；

Where α is the learning rate and s' is the new state after taking action in state s;

if the Q matrix is at the maximum number of iterations E _max No longer change, output

Further, the reward function is as follows:

wherein w is ₁ And w ₂ Respectively distance function r _d Similarity functionr _sim D is the difference between the distance of the current state position from the target point position and the distance of the next state position from the target point position;

distance function: r is (r) _d ＝pathdist(s _goal ,s)-pathdist(s _goal ,s′)

s is the current state, s' is the state after an action is performed, pathdist (s _goal S) represents the distance from state s to the target state;

similarity function:

mapping the group track in the real video to a stereoscopic space where the scene is located, wherein a represents a position vector from a current state to a next state in the video, and b represents a position vector from the current state to the next state in the simulation process.

Further, the method for generating the path of the individual in the group according to the path information of each group is as follows:

assume that the path sequence of a group is { (x) ₀ ,y ₀ ),...(x _i ,y _i ),...(x _n ,y _n ) A pedestrian's path sequence within the group is: { (x) ₀ +δ ₁ Δx,y ₀ +δ ₂ Δy),...(x _i +δ ₁ Δx,y _i +δ ₂ Δy),...(x _n +δ ₁ Δx,y _n +δ ₂ Δy)}；

Wherein delta ₁ ,δ ₂ Delta as an influencing factor when the pedestrian group is in a linear group ₁ ,δ ₂ There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group ₁ ,δ ₂ At [ -1,1]The values are taken and accord with normal distribution: delta ₁ ～N(μ,σ ² )，δ ₂ ～N(μ,σ ² )。

Further, the inter-individual collision detection employs RVO technology.

One or more embodiments provide a computer readable storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the crowd evacuation simulation method of fused data driving and reinforcement learning.

One or more embodiments provide a computer system including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the crowd evacuation simulation method of fusion data driving and reinforcement learning when executing the program. .

The one or more of the above technical solutions have the following beneficial effects:

the present disclosure provides a crowd evacuation simulation method integrating data driving and reinforcement learning. The method utilizes a group generation algorithm to model crowd grouping behaviors, combines video data and a reinforcement learning method to obtain a group motion path, and utilizes a position offset factor to obtain an individual path. The method can simulate the social behavior of the crowd in a dynamic environment realistically, embody the self-organization phenomenon of the crowd, reduce the dependence on real data quantity, and provide reference for the establishment of a crowd evacuation scheme.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart of a crowd evacuation simulation method integrating data driving and reinforcement learning according to an embodiment of the disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments and features of embodiments in this application may be combined with each other without conflict.

As is well known, the reinforcement learning (Reinforcement learning) approach possesses powerful control strategies that can learn to mimic a wide range of example actions while accommodating changes in morphology and accomplishing user-specified goals. Therefore, the fusion of the data driving technology and the reinforcement learning method has important significance for simulating the real crowd by using limited data. Firstly, tracking crowd in a real video, modeling crowd grouping behaviors through distance and speed characteristics, fitting a track, obtaining group path information, and performing data guidance for next-stage work so as to improve the calculation efficiency of the path and the authenticity of a simulation effect; secondly, establishing a path planning method integrating data driving and reinforcement Learning, providing a double-layer relation mechanism, and combining group path information in a video with a Q-Learning algorithm training optimal strategy in reinforcement Learning by an upper layer to realize group path planning; the lower layer acquires individual paths through a position deviation factor based on social characteristics and performs collision avoidance by combining a relative speed obstacle method (Reciprocal Velocity Obstacles, RVO); and finally, generating crowd animation by using a sense of reality rendering method.

Example 1

The embodiment discloses a crowd evacuation simulation method integrating data driving and reinforcement learning, as shown in fig. 1, comprising the following steps:

step 1: initializing crowd position and scene information;

the initialization of the individual position in the step 1 is to randomly initialize the individual within the range of the scene and avoid all the obstacles by comprehensively considering scene information and obstacle information.

Step 2: establishing a clustering behavior modeling method based on data driving;

the clustering behavior modeling method based on data driving in the step 2 comprises four parts: video capture and data extraction, definition of grouping behavior based on motion features, group generation algorithms and grouping path information computation.

Step 2.1: crowd tracking is carried out based on video data, and individual track information is obtained;

and shooting the crowd by using a small ant intelligent camera for real-time flow recording.

And for a real video, tracking the crowd in the video by using a TLD (Tracking-Learning-Detection) target Tracking algorithm to obtain individual track information in the crowd. The TLD tracking algorithm integrates a tracker, a detector and a learning module, so that continuous moving targets can be tracked, shielding reproduction targets can be detected and tracked again, and shielding resistance is excellent.

In tracking crowd video, it is first necessary to identify environmental features (such as walls and obstructions) and then track the individual's two-dimensional trajectory, if this trajectory is unsatisfactory, then browse the video frames and specify the location of the target in the middle video frame, and then apply bi-directional tracking to two separate time intervals (the start time and the middle time interval). In this way, the user can adaptively modify the trajectory until a satisfactory result is obtained. After the tracking is completed, the required trace information E of each person is obtained, wherein each trace can be expressed as:

E＝{(p ₁ ,v ₁ ,o ₁ ,t ₁ ),...,(p _i ,v _i ,o _i ,t _i ),...,(p _n ,v _n ,o _n ,t _n )} (1)

wherein p is _i ＝(x _i ,y _i ) Representing the position coordinates of each frame of a track, v _i Indicating the speed, o, of the individual at each time _i The speed direction of the individual at each moment is represented, n represents the total point number of a track, and t is the moment of extracting each frame.

Step 2.2: defining individual similarities based on the similarities in distance and speed;

the video data is utilized, and the individual similarity is defined by utilizing the distance similarity, sigmoid function, speed size similarity and speed direction similarity in the clustering behavior. In particular, the method comprises the steps of,

distance similarity the distance similarity dis (i, j) of individuals i, j is calculated herein using euclidean distance,

wherein x is _i An abscissa representing the pedestrian position, y _i Representing the ordinate of the pedestrian position.

When researching the speed and the direction similarity, the sigmoid function is used in the speed difference amplitude function and the angle difference amplitude function as shown in a formula (3) to ensure that the similarity results are respectively mapped between [0,1] in order to accelerate the training speed of the grouping model and avoid overlarge data difference values in the mapping set.

Speed magnitude similarity the speed magnitude similarity between individuals i, j is represented herein by defining a speed difference amplitude function vel (i, j), with the smaller the result representing the more similar between individuals.

Wherein v is _i -v _j Representing the difference in velocity between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.

Speed direction similarity is herein represented by defining an angle difference amplitude function ori (i, j) to represent the direction similarity between individuals i, j, with the smaller the result, the more similar the individuals are represented.

Wherein o is _i -o _j Representing the speed direction difference between the two individuals i, j, the square difference value is taken herein in order to make the difference between the two more significant.

Finally, in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:

wherein w is _d +w _v +w _o ＝1，w _d Weights representing distance similarity features, w _v Weights, w, representing speed magnitude similarity features _o Weights representing speed direction similarity features. The larger the value of s (i, j) is, the greater the inter-individual similarity is.

Step 2.3: grouping the crowd by using a group generation algorithm according to the individual similarity;

wherein the grouping process is as follows:

the group generation algorithm comprises the following steps:

(1) M individuals were selected as cluster centers (beta) ₁ ,...β _i ,...β _m )；

(2) Calculating the similarity between the rest individuals in the crowd and m cluster centers by using a formula (6), and classifying the rest individuals in the crowd into clusters with the maximum similarity value;

(3) For each cluster, a virtual cluster center is generated using equations (7) (8) (9) (10)

(4) Respectively calculating the similarity s (O) of all individuals in the cluster and the virtual cluster center thereof according to the formula (6) _v ,P _i ) Wherein P is _i ∈cluster _n N is more than or equal to 1 and less than or equal to m, n represents a clustered sequence number, and an element P with the maximum similarity with a virtual cluster center is selected _i It is taken as a new cluster core.

(5) Repeating the step (2), the step (3) and the step (4) until m cluster cores of the groups are not changed any more;

(6) The crowd is divided into m groups, the group of individuals i is C (i), wherein C (i) e C, c= { C (1)..c (m) }, C represents the set of groups.

In (4), the cluster centers of the groups are updated for each group, and a virtual group center O is generated according to formulas (7) (8) (9) (10) _v 。

Wherein 1.ltoreq.j.ltoreq.agennum, agennum representing the total number of all individuals, num representing the number of individuals in the cluster,

represents the abscissa of individual i, +.>

Representing the ordinate of individual i, people [ i ]].vel[j]Representing the speed similarity between individuals i and j, people [ i ]].ori[j]Indicating the directional similarity between individuals i and j.

There are two constraints in the group generation process: (1) each group contains at least one individual; (2) Each individual is and is only a member of one of the groups.

Step 2.4: group path information is calculated.

The group gathering path information is calculated as:

E _r ＝(p′ ₁ ,...,p′ _c ,...,p′ _k ) (11)

wherein p is _c ' represents the position information of a group at time c. Omega _i ,ω _j Weighted values for the x and y coordinates, respectively.

The problem of collision between the fitted clustered paths and the obstacle needs to be considered while deriving the clustered paths, and when this occurs, the position of one of the individuals is used to replace the position after the fitting is completed.

Step 3: a path planning method integrating reinforcement learning and data driving.

And 3, providing a fusion data driving and reinforcement learning path planning method. The method mainly comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; and a third step of avoiding collision between individuals by using a relative velocity barrier method.

Step 3.1: grouping the initialized crowd by using a group generation algorithm;

step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s _goal The action value function is iteratively updated and learned by a simple value in the Q-Learning algorithm.

State set setting crowd state set S _i State S e S _i ,

s＝(x,y) (13)

When planning a path for a group, we discretize the scene into a grid and represent each location state using the center of the cell, discretizing the crowd state represents a significant reduction in the number of states, where,s= (x, y) represents the position of the pedestrian at each moment, s _goal Defined as a target state, and the crowd reaches the target state to stop moving.

Action set defines an action set A for each state _i . Action a e A _i A is the state to be selected next, wherein the action set A _i = { east, west, south, north, southeast, northeast, southwest, northwest }, for the current state s, different states are reached after selecting different directions. We take an action in state s (x, y) to produce a new state s ' = (x ', y ') and use the position of the next grid center point to represent.

The bonus function gives a target state s _goal By being in state S e S _i Take action a e A _i Is:

wherein r is a direct reward value, s, s' respectively represent a current state and a new state, w ₁ And w ₂ Weights of a distance function and a similarity function, respectively, D is a difference between a distance between a current state position and a target point position and a distance between a next state position and the target point position, r _d And r _sim A distance function and a similarity function, respectively, the distance function being expressed as:

r _d ＝pathdist(s _goal ,s)-pathdist(s _goal ,s′) (15)

s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r _d A larger value of (c) indicates a closer to the target location, i.e., more rewards are generated. Here we calculate the similarity between the trajectories using cosine similarity, which is to map the individual trajectories to stereo space first, then calculate the cosine value of the angle between the two group vectors to measure the similarity between them, the cosine value of the angle being [ -1,1]The closer to 1, the more similar the two group trajectories are.Herein, the similarity function is expressed as:

a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation process, and r _sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r _sim The closer to 1, the more similar the trajectory of the pedestrian, i.e., the more rewards are generated.

Training an optimal strategy to give a target state s _goal The action value function being learned by a simple iterative update of values in a Q-Learning algorithm, e.g.

Where s, a represents the current state and behavior and s ', a' represents the next state and behavior of s. r (s, a) represents a direct reward value of a under s, Q (s, a) represents a path training estimated value, maxQ (s ', a') represents an optimal path estimated value, alpha is a learning rate, lambda is called an attenuation factor, and the importance degree of future returns relative to current returns is indicated.

In this section, the present application utilizes the Q-Learning algorithm in reinforcement Learning to plan paths for crowd groups, the algorithm steps are as follows:

input: s is S _i ,s _goal ,A _i ,λ,α,E _max

And (3) outputting: q (Q)

1: initializing a Q matrix;

2: randomly selecting a state as an initial state S epsilon S _i ；

3: selecting an action a E A in the action set _i ；

4: calculating an immediate report r (s, a) and a next state s';

5: updating the Q matrix by equation (17) until s=s _goal ；

6: if the Q matrix is at the maximum number of iterations E _max And no longer changes, Q is output.

Step 3.3: based on the social characteristics in the crowd, the application provides two social groups, a linear group and a leader following group on the basis of the obtained group path. The walking track of each pedestrian is obtained by the positional deviation factor (deltax, deltay).

Assume a sequence of paths for a group { (x) ₀ ,y ₀ ),...(x _i ,y _i ),...(x _n ,y _n ) The sequence of paths for a pedestrian within a group can be expressed as:

{(x ₀ +δ ₁ Δx,y ₀ +δ ₂ Δy),...(x _i +δ ₁ Δx,y _i +δ ₂ Δy),...(x _n +δ ₁ Δx,y _n +δ ₂ Δy)} (18)

Step 3.4: and (3) carrying out collision avoidance among individuals by using an RVO algorithm (Reciprocal Velocity Obstacles, RVO), and carrying out real-time coupling with crowd motion to realize crowd motion simulation.

During the crowd exercise, the important parameters affecting the calculation speed of each individual i are shown in table 1:

TABLE 1 important parameters affecting the calculated speed for each group i

In crowd exercise, AV ⁱ Representing a reasonable set of speeds of an individual, any one belonging toAV ⁱ V' of (c) satisfy:

||V _i ^pref ||＜＝V _i ^max (20)

wherein, the balance is _i (V _i ') is the speed penalty value for group i, tc (V) _i ') is the expected collision time of group i with surrounding people, ||V _i ^pref -V _i ' is the absolute value of the difference between the desired speed and the candidate speed. The next time speed v of the group _t+1 Two conditions of a large collision time with other groups and a minimum error value from a desired speed need to be satisfied.

Step 4: generating realistic crowd animation

The simulation system of the realistic animation is a cross-platform simulation system developed based on the XNA technology. The three-dimensional real-time sense rendering platform mainly comprises MS.NET Framework 4.0 and XNA 4.0, and a scene and a motion path are imported on the platform to generate an animation effect.

Example two

An object of the present embodiment is to provide a computer-readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes:

Example III

It is an object of the present embodiment to provide a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing when executing the program:

One or more of the above embodiments have the following technical effects:

It will be appreciated by those skilled in the art that the modules or steps of the present application described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, so that they may be stored in storage means and executed by computing means, or they may be fabricated separately as individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated as a single integrated circuit module. The present application is not limited to any specific combination of hardware and software.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

While the foregoing description of the embodiments of the present application has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the application, but rather, it is intended to cover all modifications or variations which may be resorted to without undue burden to those skilled in the art, having the benefit of the present application.

Claims

1. The crowd evacuation simulation method integrating data driving and reinforcement learning is characterized by comprising the following steps of:

step 1: initializing crowd position and scene information;

step 2: establishing a clustering behavior modeling method based on data driving, which comprises the following steps:

step 2.2: defining individual similarities based on the similarities in distance and speed; in the group movement process, the similarity s (i, j) between the individuals i, j is calculated as follows:

wherein dis (i, j), vel (i, j) and ori (i, j) represent distance similarity, velocity similarity and direction similarity, respectively, w _d 、w _v 、w _o Weights, w, representing distance similarity, velocity similarity and direction similarity, respectively _d +w _v +w _o ＝1；

Step 2.3: grouping the people according to the individual similarity;

step 2.4: the group path information is calculated, specifically:

E _r ＝(p′ ₁ ,...,p′ _c ,...,p′ _k )，

wherein p' _c Representing the position information of a group at time c,

weighting values of an x coordinate and a y coordinate respectively; step 3: a path planning method integrating reinforcement learning and data driving comprises three parts: the first part is to combine the group path information in the video with the Q-Learning algorithm in the reinforcement Learning to train the optimal strategy, so as to realize the group path planning; the second part is used for acquiring an individual path through a position offset factor in the bottom individual movement process; a third section for avoiding collision between individuals by using a relative velocity obstacle method; the method comprises the following steps:

step 3.1: acquiring initialized crowd grouping information;

step 3.2: fusing the Q-Learning algorithm in the reinforcement Learning method with group data in the video, training an optimal strategy, and giving a target state s _goal The action value function is learned through a simple value iteration update in the Q-Learning algorithm, which comprises the following steps:

state set: setting crowd state set S _i State S e S _i Where s= (x, y) represents the position of the pedestrian at each moment, s _goal The crowd is defined as a target state, and the crowd stops moving when reaching the target state;

action set: defining an action set A for each state _i Action a e A _i A is the state to be selected next, wherein the action set A _i = { east, west, south, north, southeast, northeast, southwest, northwest }, taking an action in state s (x, y) will yield a new state s ' = (x ', y ') and expressed using the position of the next grid center point;

bonus function: given a target state s _goal By being in state S e S _i Take action a e A _i Is:

r _d ＝pathdist(s _goal ,s)-pathdist(s _goal ,s′)

s, s' represent the current state and the new state, respectively, and the pathdist () function represents the distance from one position state to the target position state, r _d A larger value of (a) indicates a closer to the target location, i.e., more rewards are generated;

the similarity function is expressed as:

a represents a position vector from a current state to a next time state in real data, b represents a position vector from the current state to the next time state in a simulation processPosition vector, r _sim Calculating similarity of each step of pedestrians and each step of walking in video in simulation process by using cosine similarity, and r _sim The closer to 1 the value of (c), the more similar the trajectory of the pedestrian, i.e. the more rewards are generated;

training an optimal strategy: given a target state s _goal The action value function is learned through a simple value iteration in the Q-Learning algorithm:

wherein s, a represents the current state and behavior, s ', a' represents the next state and behavior of s, r (s, a) represents the direct rewarding value of a taken under s, Q (s, a) represents the path training estimated value, maxQ (s ', a') represents the optimal path estimated value, alpha is the learning rate, lambda is the decay factor, and the importance of future rewards relative to the current rewards is indicated;

the Q-Learning algorithm in reinforcement Learning is utilized to plan paths for crowd groups, and the algorithm comprises the following steps:

input: s is S _i ,s _goal ,A _i ,λ,α,E _max

And (3) outputting: q (Q)

Initializing a Q matrix;

randomly selecting a state as an initial state S epsilon S _i ；

Selecting an action a E A in the action set _i ；

Calculating a direct prize value r (s, a) and a next state s';

updating the Q matrix by equation (1) until s=s _goal ；

If the Q matrix is at the maximum number of iterations E _max If no change is made, outputting Q;

step 3.3: on the basis of obtaining the group path, two social groups are proposed: a linear group, a leader follower group; the walking track of each pedestrian is obtained through the position deviation factors (deltax, deltay), specifically:

assume one of the groupsPath sequence { (x) ₀ ,y ₀ ),...(x _i ,y _i ),...(x _n ,y _n ) The sequence of paths for a pedestrian within a group can be expressed as: { (x) ₀ +δ ₁ Δx,y ₀ +δ ₂ Δy),...(x _i +δ ₁ Δx,y _i +δ ₂ Δy),...(x _n +δ ₁ Δx,y _n +δ ₂ Δy)}

Wherein delta ₁ ,δ ₂ Delta as an influencing factor when the pedestrian group is in a linear group ₁ ,δ ₂ There is a first-order functional relationship between the two variables, and at [ -1,1]The value is taken in between; delta when crowd is the leading follower group ₁ ,δ ₂ At [ -1,1]The values are taken and accord with normal distribution: delta ₁ ～N(μ,σ ² )，δ ₂ ～N(μ,σ ² )；

Step 3.4: the RVO algorithm is utilized to avoid collision among individuals, and real-time coupling is carried out on the collision and crowd movement, so that crowd movement simulation is realized;

step 4: and generating the realistic crowd animation.

2. The crowd evacuation simulation method of fusion data driving and reinforcement learning of claim 1, wherein grouping the crowd according to individual motion similarity comprises:

initializing m individuals as cluster centers, and respectively calculating the individual motion similarity between the m individuals and m cluster centers for the rest individuals, and classifying the rest individuals into the cluster with the largest similarity value;

for each cluster, generating a virtual cluster center, respectively calculating the individual motion similarity of all individuals in the cluster and the virtual cluster center, and taking the element with the highest similarity with the virtual cluster center as a new cluster center; the virtual cluster center is the geometric center of all individuals in the cluster, and the speed and the direction are the average value of the speeds and the directions of all the individuals in the cluster;

3. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a fused data driven and reinforcement learning crowd evacuation simulation method according to any of claims 1-2.

4. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fused data driven and reinforcement learning crowd evacuation simulation method of any one of claims 1-2 when the program is executed by the processor.