WO2017156443A1

WO2017156443A1 - Global optimization-based method for improving human crowd trajectory estimation and tracking

Info

Publication number: WO2017156443A1
Application number: PCT/US2017/021883
Authority: WO
Inventors: Sejong YOON; Mubbasir KAPADIA; Vladimir Pavlovic
Original assignee: Rutgers, The State University Of New Jersey
Priority date: 2016-03-10
Filing date: 2017-03-10
Publication date: 2017-09-14

Abstract

Systems and methods tat human crowd trajectory estimation and tracking. The methods comprise: processing, by a computing device, sensor data for captured crowd movements to extract trajectory information therefrom: transforming the extracted trajectory information into optimized trajectory information by removing noise from the extracted trajectory information and adding estimated missing data to the extracted trajectory information; and selectively to modifying the optimized trajectory information so that crowd movements defined thereby are consistent with reference crowd movement.

Description

GLOBAL OPTIMIZATION-BASED METHOD FOR HUMAN CROWD TRAJECTORY

ESTIMATION AND TRACKING

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Patent Application No. 62/306,258, filed on March 10, 2016. The content of the above patent application is incorporated by reference in its entirety.

BACKGROUND

Statement of the Technical Field

[0002] The present disclosure generally concerns computing systems. More particularly, the present invention relates to implementing systems and methods for global optimization-based methods for human crowd trajectory estimation and tracking.

Description of the Related Art

[0003] There is an extensive amount of research in the modeling, simulation, and analysis of crowds at varying scales that capture both the macroscopic and microscopic aspects of crowd flow. As a precursor to pedestrian tracking, it is common to provide (or learn) a motion which can be used to guide and improve tracking accuracy. For example, one conventional solution is known as a Social Force ("SF") model. The SF model models dynamic social behavior of individuals in a crowd. This model is then used to improve tracking accuracy. Another conventional solution builds upon the SF model by proposing a multi-target tracking model that succeeds in uniformly including different sources of information (such as appearance, physical constraints, and the social behavior of walking people). The multi-target tracking model is built using a Conditional Random Field framework. As the multi-target tracking model cannot be globally optimized, an approximate inference strategy is adopted.

[0004] Another conventional solution employs a real-time algorithm for trajectory estimation in medium-density crowds using adaptive particle filtering, while relying on a multi-agent motion model based on velocity obstacles. The central idea behind this approach is to separate tracking from waypoint estimation, and utilize a motion model for estimation. This approach is extended in the AdaPT framework to provide real-time adaptive tracking for crowded scenes.

[0005] Yet another conventional solution employs a global optimization approach with origin-destination prior, along with a novel Social Affinity Feature Map ("SAM") that encodes behavioral pattern of group of pedestrians learnt from large pedestrian data and hence help association of broken tracklets. Due to the inherent limitations of single camera trackers when applied to dense crowds, new approaches attempt to fuse the data from multiple cameras. This approach alleviates the single camera tracking problem by using multiple overlapping camera sensors, and is able to handle severe and persistent occlusions in dense, crowded scenes. This approach uses a modified SF model to estimate movement in unobserved areas for target re- identification. A Kalman-Consensus algorithm can be used for tracking and activity recognition in distributed camera networks.

[0006] Complementary to the crowd tracking problem is the simulation of crowd movement, with many proposed solutions in the graphics community. As described previously, crowd tracking and simulation is tightly coupled, where crowd simulators may be used as motion priors to improve tracking accuracy, and the output of crowd trackers may be used to train data-driven models for crowd simulation.

[0007] Tracking of the movement of individuals in crowds is an indispensable component to crowd modelling and analysis, with applications in surveillance, crowd management, security and disaster prevention, crowd evacuation studies, as well as data-driven animation for visual effects and games. Crowd tracking is uniquely challenging since tracking must often be performed over large spatial areas, where multiple sensors need to be used to obtain sufficient coverage. Often, the information provided by these sensors (or the results of the tracking algorithms) may be noisy, and the combination of the sensors may not have complete coverage, resulting in missing or incomplete traces. SUMMARY

[0008] The present invention concerns implementing systems and methods for human crowd trajectory estimation and tracking. The methods comprise: processing, by a computing device, sensor data for captured crowd movements to extract trajectory information therefrom;

transforming the extracted trajectory information into optimized trajectory information by removing noise from the extracted trajectory information and adding estimated missing data to the extracted trajectory information; and selectively modifying the optimized trajectory information so that crowd movements defined thereby are consistent with reference crowd movements.

[0009] In some scenarios, the sensor data is generated by a camera, a microphone and/or an infrared sensor. The methods also involve processing the extracted trajectory information for each person in the crowd to compute a plurality of different energy terms. The plurality of energy terms includes, but is not limited to, a physical constraint energy term characterizing peoples movements, an environmental constraint energy term characterizing collisions of individuals with obstacles, and a social constraint energy term characterizing social interactions of the individuals. A first analysis is performed using the physical constraint energy terms for each person in the crowd to determine whether the velocity and acceleration of the person's movements are consistent with reference velocity and acceleration ranges for a human within a crowd. A second analysis is performed using the environmental constraint energy terms for each person in the crowd to detect collisions of the person with obstacles. A third analysis is performed using the social constraint energy terms for each person in the crowd to detect social relationships between individuals in the crowd. The extracted trajectory information is transformed into the optimized trajectory information based on results of the first, second and third analysis. For example, the extracted trajectory information is transformed into the optimized trajectory information using an Altering Direction Method of Multipliers ("ADMM") based technique.

[0010] In some scenarios, the reference crowd movements are determined based on historical sensor data specifying historical real human crowd movements. The optimized trajectory information is selectively modified by: computing a squared Euclidean distance between each optimized trajectory point and a reference real world trajectory point; comparing the squared Euclidean distance to a threshold value; and modifying the optimized trajectory point so that the optimized trajectory point becomes closer to the reference real world trajectory point.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] Embodiments will be described with reference to the following drawing figures, in which like numerals represent like items throughout the figures.

[0012] FIG. 1 is an illustration that is useful for understating an exemplary global optimization based trajectory refinement framework.

[0013] FIG. 2 is an illustration of an exemplary global optimization based trajectory refinement framework given two trajectories i and j, assuming t-th point is missing in the trajectory i. The connection of the point x^li (shaded as gray) to the energy term E^(xi) is disconnected (red). Therefore, x'i will be estimated purely based on other terms (shaded as yellow).

[0014] FIGS. 3(a)-3(c) (collectively referred to as "FIG. 3") provide graphs showing the results of stability experiments on an Altering Direction Method of Multipliers ("ADMM") algorithm with 2 agents, 5 break points (frames), 20 random initializations. As can be seen, about 100 iterations and 50-60 seconds, the optimization is done. Note that early stops yield premature results.

[0015] FIGS. 4(a)-4(c) (collectively referred to as "FIG. 4") provide graphs showing the results of stability experiments on an ADMM algorithm with 2, 4, 6, 8, 10 agents, 5 break points (frames), 1 random initialization. As a typical trend, number of iterations and computation time increase almost linearly.

[0016] FIG. 5 provides illustrations that are useful for understanding exemplary trajectories of simulation scenarios used to validate our framework. [0017] FIG. 6 provides a plurality of graph that show performance of the present solution on simulated crowd data with noisy trajectories (SN 50 db). Reference, noisy trajectories are shown in black (left), and corresponding reconstructed trajectories are shown in blue (right).

[0018] FIG. 7 provides a plurality of graphs that show an input with missing information (20% missing, missing parts marked as red, left) and reconstructed trajectories (right).

[0019] FIG. 8 provides a plurality of graphs that show a noisy input with missing

information (20% missing, missing parts marked as red, left) and reconstructed trajectories (right) for bottleneck-evaluation scenario.

[0020] FIG. 9 provides an illustration that is useful for understanding how crowd behaviors are assessed using a set of raw videos.

[0021] FIG. 10 provides an illustration that is useful for understanding an exemplary method for global optimization-based methods for human crowd trajectory estimation and tracking

[0022] FIG. 11 is an illustration of an exemplary architecture for a system.

[0023] FIG. 12 is an illustration of an exemplary architecture for a computing device.

[0024] FIG. 13 is a flow diagram of an exemplary method for global optimization-based methods for human crowd trajectory estimation and tracking.

DETAILED DESCRIPTION

[0025] It will be readily understood that the components of the embodiments as generally described herein and illustrated in the appended figures could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of various embodiments, as represented in the figures, is not intended to limit the scope of the present disclosure, but is merely representative of various embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated. [0026] The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by this detailed description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

[0027] Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present invention should be or are in any single embodiment of the invention. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one

embodiment of the present invention. Thus, discussions of the features and advantages, and similar language, throughout the specification may, but do not necessarily, refer to the same embodiment.

[0028] Furthermore, the described features, advantages and characteristics of the invention may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, in light of the description herein, that the invention can be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the invention.

[0029] Reference throughout this specification to "one embodiment", "an embodiment", or similar language means that a particular feature, structure, or characteristic described in connection with the indicated embodiment is included in at least one embodiment of the present invention. Thus, the phrases "in one embodiment", "in an embodiment", and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

[0030] As used in this document, the singular form "a", "an", and "the" include plural references unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. As used in this document, the term "comprising" means "including, but not limited to".

[0031] The present solution concerns a novel global optimization-based approach to analyze human crowd movement, obtained from multiple heterogeneous sensors. The inventive approach places no restrictions on the types of movement behaviors, or the density of the crowd. The present solution operates on arbitrarily noisy sensor information (abstracted as movement traces) with missing information. The sensors can be any kind that can identify human crowd features, including but not limited to visual, aural, or infrared sensors.

[0032] In some scenarios, the present solution is used in connection with video recordings obtained by visual sensors (e.g., cameras) that can consistently observe the crowd over a long period of time. Given the tracked movement information of individuals in a crowd from multiple noisy sensors with incomplete coverage, the present solution estimates a holistic view of the entire crowd by minimizing artifacts due to sensor noise and tracking inaccuracies, and estimating missing information. The whole process is done in an end-to-end fashion. Finally, the present solution is able to operate in a decentralized fashion, where sensors can cooperate to obtain better results even in the case of hardware/software failures of either local sensor or central processing server.

[0033] The present solution is useful in almost all surveillance systems for traditional surveillance purposes. For example, the present solution can be used to improve the available human tracker performance, to detect a group of people performing abnormal behavior, and/or to augment or improve the performance of existing crowd monitoring systems (e.g., Placemeter available from Placemeter Inc. of New York, United States).

[0034] Tracking the movement of individuals in a crowd is an indispensable component to reconstructing crowd movement, with applications in crowd surveillance and data-driven animation. Typically, multiple sensors are distributed over a wide area and often lack complete coverage of the area or the input introduces noise due to the tracking algorithm or hardware failure. In the present document, a novel method is described that complements existing crowd tracking solutions to reconstruct a holistic view of the microscopic movement of individuals in a crowd, from noisy tracked data with missing and even incomplete information. Central to the present approach is a global optimization based trajectory estimation with modular objective functions.

[0035] The crowd tracking problem, depends on scale and can be divided into three approaches: a macroscopic approach; a mesoscopic approach; and a microscopic approach. Macroscopic approaches track very dense crowds when each individual is hard to identify. These approaches typically consider the flow of crowds and are usually applied in highly dense crowd movement simulations. Mesoscopic approaches are used for tracking groups of people. These approaches consider the groups as crowd blobs. Microscopic approaches involve cases when individuals are identified and tracked separately. In this case, the crowd tracking problem becomes a multi-target object, or specifically a pedestrian tracking problem. An extensive amount of research on this problem has been done within the computer vision community. In particular, research into methods based on human behavior modeling, starting from a variant of Discrete Choice Model ("DCM") or SF models, have been proposed with applications in robotics. The present solution addresses the problem of reconstructing crowd movement, tracked using multiple, noisy sensors.

[0036] The problem can be formulated as follows. Let us consider tracked information from multiple visual sensors, with incomplete coverage, where sensor observations may be arbitrarily noisy. Assumptions are made that: tracks are obtained from each sensor using existing pedestrian tracking solutions; and traces of individuals in the crowd across sensors have correspondence which can be uniquely identified using existing methods (e.g., the Hungarian algorithm). The out-come trajectories often have missing information or noisy information due to sensor inaccuracies, algorithmic (tracking or matching) failure, or insufficient coverage due to the Field Of View ("FOV") of the sensors. To reconstruct the holistic view of the microscopic crowd motion, the reconstruction should be robust to noise and missing tracklets, while satisfying the properties desired for most people.

[0037] To address the challenges described above, a global optimization based trajectory refinement framework using ADMM is provided. This framework leverages a recently introduced message passing optimization method for multiple robot trajectory planning. In this regard, the global optimization problem is framed as two input positions (e.g., an initial input position and a goal input positon) per person agent, with modular energy functions to minimize. These energy terms encode various desirable properties of individuals in a crowd, including physical constraints (e.g., maximum possible speed for a person), social constraints (e.g., efficient travel path, avoid colliding others) or environmental constraints (e.g., avoid colliding the static obstacles such as walls). An overview of the process is depicted in FIGS. 1 and 10.

[0038] The efficacy of the present solution relies on two central contributions. First, a global trajectory refinement method is proposed (based on ADMM for crowd trajectory estimation) that is inherently suitable to handle noisy and missing information. Second, the crowd trajectory estimation problem is formulated as a multi-objective optimization problem using a set of modular energy terms that encode physical, social, and environment constraints. The framework is validated on synthetic data with increasing amounts of noisy and missing information, showcasing significant reconstruction improvements over baseline methods.

[0039] The present solution includes the following important technical innovations. First, the problem of crowd trajectory estimation is formulated from multiple disparate sensors as a global trajectory optimization problem which is solved using a novel ADMM based framework. The application of ADMM to the domain of crowd trajectory estimation is a novel feature of the present solution. Second, a novel objective function is defined which accounts for energy minimization, penalties for discontinuities in movement, and collisions, as well as similarity to reference data in order to learn the model of the crowd. Third, the present solution automatically learns a model from existing data and can rely on ground truth (previous observations) to improve training accuracy. This is in contrast to previous approaches which rely on expert designed heuristics. Fourth, the framework of the present solution can operate in a decentralized fashion, which makes it uniquely applicable to wide area surveillance domains.

[0040] The following EXAMPLE 1 is provided to illustrate certain embodiments of the present solution. The following EXAMPLE 1 is not intended to limit the present solution in any way. EXAMPLE 1

Crowd Trajectory Estimation Using Global Trajectory Optimization

[0041] It is desirable to estimate an underlying motion directly from data without the need of training, thus the present solution's framework is an unsupervised framework. Each crowd dataset D represents a scenario. Each crowd dataset D is a set of crowds, D_n = {X_n; E}, which is again a set of trajectories X_n and environmental configuration E containing static obstacle information. Here, n denotes the index of the trajectory set. E is the set of obstacle positions and shapes, which is denoted as Ok, where k = 1::K is the total number of obstacles. M denotes the number of agents (people) and T for the number of timestamps for the D_n. D_n can be noisy and may have missing information and/or collisions between two or more trajectories while actual humans did not collide. This also applies to the case between a person and the environmental obstacles. X_n = { i} where i = 1::M and t = 0::T. Notably, synchronously sampled trajectories are considered, i.e., all x_n are sampled at same set of t global clock. χ^ι _η;ΐ encodes the minimal information considered, which is the x;y location of the agent i at a specific timestamp t. The subscript n is omitted when it is obvious and t is omitted when unnecessary (e.g., applies the same for all t) for notational brevity. Additionally, the sensors and trackers can provide confidence level information B_n = {b'nji} for each of these points representing how much confidence there is about the point observation in the trajectory.

[0042] In accordance with the present solution, D is refined. In particular, the possibly noisy, missing, or corrupted estimates of X_n 2 D based on the essence of the crowd are conditioned as energy functions in an approach that will be described in detail below. This improves the refined trajectory such that it is complete, artifact-free, and still able to capture the original behavior of the specific individuals in the dataset.

[0043] Now, the global optimization-based crowd trajectory estimation framework is described for reconstructing the movement of individuals in a crowd that were tracked using possibly multiple noisy sensors with incomplete coverage. In particular, the message passing optimization method (that has been previously applied to multi-robot trajectory planning) is leveraged. First, given a crowd D, a location based, energy minimization problem is considered as:

where the first term Fi can be any type of minimization function that encodes a desired property of the agent i's individual behavior. For example, a function is defined that measures the kinetic energy required for the i*^A agent (person) to move to the next position or the penalty function for a specific maximum velocity so that the agent maintains the desired maximum speed. The second summation term Q- imposes pairwise interactive constraints between two agents i and j. Constraints in setting the interactions will encode a preference for no hard collisions, n

describes the radius, or size, of agent i. Note that this term can be applied to impose collision constraint between agent i and obstacle k in the similar way. Employing a mechanical analogy approach, one can formulate this as a global optimization problem that can be, for instance, solved by using message passing variant of the alternating direction method of multipliers.

[0044] Based on this global optimization framework, a data-driven motion model is considered so that our estimated trajectory will best mimic the real-world trajectory while satisfying the constraints and achieving a smooth trajectory. For example, an additional minimization term is considered that minimizes the Euclidean distance between the optimizer- estimated trajectory and the realistic trajectory obtained by either real video recording or synthetic simulator. This approach allows a refined trajectory to be obtained with the desired constraints satisfied, thereby addressing the drawbacks of real video recordings which are not reliable in the real world due to noisy sensor and depends on tracker performance.

[0045] An important feature of the problem formulation is that it does not require future frames to be available and only requires estimates of the initial and goal position. More specifically, the present solution takes as input the initial position, (expected) goal position and the interim points that can be initialized as either randomly or reasonable initial guesses based on the environmental structure. For example, interpolated points are used based on optimal static trajectories of each individual, computed using heuristic search methods such as A. Here, since one goal is to refine tracker output, a partial tracking result is used with linear interpolation for initialization.

[0046] The present framework operates under the following assumptions: (1) the static environment configuration is known a priori; (2) the initial position of each individual is known, along with an estimate of their desired goal location; (3) the tracked trajectories of individuals from each sensor are provided as time-stamped position traces, generated using state-of-the art pedestrian tracking solutions; (4) the tracks across sensors have correspondence which can be identified using existing methods. However, trajectories may be arbitrarily noisy with missing and incomplete information, due to sensor inaccuracies, or errors in tracking and correspondence matching.

[0047] Agent-based Energy Term

[0048] In the tracker-output trajectory, an assumption is made that reasonably reliable initial and goal positions along with noisy (and possibly missing) trajectory are extracted by the tracker for each agent. This assumption is not too strict so that rough estimates can be derived. For the initial position, the detector output is used. For the goal position, environment knowledge is considered for a typical exit point for the given configuration. Also, it is believed that the tracker output is correct. Not all agents' locations are necessarily available in all frames, i.e., missing trajectory sections are allowed.

[0049] Each i* agent (person) status at time t is modeled as a column vector Λ½ denoting the 2D {x;y) location of the agent at time t. (In another approach, the space-velocity model instead of this space only mode could be employed, if the energy function utilizes the velocity information.) Then, the objective is minimized as follows

T for a given x°; and χ^τ» . The first term is defined here, an can be further decomposed into several

Here, E* is the typical kinetic energy term defined as sum of distances each agent traveled, c* is the ex ected mass of the agent. The kinetic energy minimization term is further refined as

where n'j denotes the expected location of agent i at time t based on locations of adjacent time stamps, so that its location to be further smoothed (or regularized) out. E_gt is a pairwise

Euclidean distance to the known trajectory ^J * and can be regularized in the similar manner. Note that this term is essentially imposing a Gaussian prior over the initial tracker output. Using the notation from ΕΛ term, the maximum velocity term E with constraint is defined as

(i.e., temporally smoothed trajectory with constrained de-sired maximum velocity c_v).

[0050] Collision Cost Modeling

[0051] The collision cost term must be defined with respect to other agents in the scene. It is desirable to find each agent's next possible location close to the expected location of the agent based on the belief from other neighboring agents. Using this notion, the collision cost is

(9) where denotes the f^h agent's expected location of the I^th agent at t and η is a weight parameter for the cost. Therefore, this is the cost for too aggressive movement from what is expected from other agents while satisfying the collision constraints. [0052] Dealing with Missing Information

[0053] An important aspect of the present trajectory refinement framework is its ability to handle missing data. In the global optimization based framework, it can be easily done by removing a pair wise distance term to the input (E^) for missing entries, based on the confidence level Βη of information from the sensors.

[0054] This can be intuitively interpreted as disconnecting edges between the pairwise distance energy function Eg* for the missing point and the function that aggregates the information from energy functions and imposes consensus constraint from message passing structure (although handling the missing values of input data was not considered). FIG. 2 shows an example graphical description of our missing value handling scheme.

[0055] To demonstrate the utility of the proposed solution, experiments were conducted on various standard synthetic crowd simulation scenario trajectories. For the synthetic data generation, the populate SF model was used.

[0056] Non-convex global optimization

[0057] ADMM, a convex optimization algorithm, is known to give reasonably good solutions for many non-convex problems in practice. Here, it is shown that the stability of the ADMM framework on a non-convex problem is closely related to the problem solved by the present solution. Experiments were conducted on the concentric circle ("CONFl") scenario wherein agents will change positions with agents at antipodal positions. FIGS. 3-4 show the CONFl scenario experimental results obtained when the number of agents and initializations are varied. The data shows the number of iterations, computation time and objective values. As evident from FIGS. 3-4, the proposed framework converges reasonably well in the majority of cases despite the problem being non-convex.

[0058] Robustness to missing tracklets and noise

[0059] To demonstrate the robustness of the present framework to missing values and noise, experiments were conducted on synthetic crowd simulation scenarios. Six (6) environment benchmarks were considered as illustrated in FIG. 5. The trajectories of the crowd for each benchmark were generated synthetically using the SF crowd simulation model. Three thousand (3000) frames per scenario were simulated using thirty to forty (30 - 40) agents. Then, the trajectories were sub-sampled so that the duration of the trajectory is roughly one hundred (100) frames.

[0060] Noisy Trajectories. TABLE 1 (left hand side) shows the reconstruction performance against the ground truth trajectory, with synthetically added noise. To demonstrate the reconstruction robustness to noise, Additive White Gaussian Noise ("AWGN") was considered with different amount of Signal-to-Noise Ratio ("SNR") ranging from thirty to fifty decibels (30 - 50 dB). As a baseline comparison, a median filter was used. For noisy data experiments, an additional baseline of no-filter was considered (meaning that the original corrupted signal without any noise removal process was applied). A root mean squared error was used for the measure. As shown in the TABLE 1, the proposed solution effectively improves the

reconstruction performance over a naive median filter with a significant margin. FIG. 6 visually compares the noisy trajectories (SNR 50 dB) and the corresponding reconstructed trajectories using the present solution on the different environment benchmarks. Confirming the quantitative results in TABLE 1, the reconstructed trajectories are noticeably smoother, lack discontinuities, and are collision-free. The present solution is thus able to effectively reconstruct trajectories while removing artifacts introduced as a result of noise.

TABLE 1

[0061] TABLE 1 : Root Mean Squared ("RMS") error (in meters) of reconstructed trajectory versus ground truth. For all cases, the performance of the proposed framework is compared with a mean filtering as baseline. When applying the median filters for missing data, the data was preprocessed to fill in missing parts with linear interpolation. For a noise removal case, an addition comparison was performed with the original corrupted trajectory to demonstrate that the present framework significantly improves over doing nothing in the cases of non-trivial amount of noise. Note that for missing value experiments in TABLE 1, a noise-free input was assumed thus no filtering baseline was not considered.

[0062] Trajectories with Missing Information. TABLE 1 (right hand side) shows the reconstruction performance against the ground truth trajectory, where portions of the trajectory were artificially removed. Ten to thirty percent (10 - 30 %) of the trajectories of each agent was removed to evaluate the robustness of the present solution. The qualitative results shown in FIG. 7, confirm the quantitative results. The present solution is able to effectively reconstruct large portions of the missing information in the trajectories, while ensuring trajectories are artifact-free (without collisions and discontinuities), and still preserving the original essence of the crowd dataset. [0063] An interesting observation to note is that the proposed method shows strong reconstruction performance over the baseline method when the congestion exists (e.g.

bottleneck-evacuation-2 and bottleneck-squeeze, hallway-two-way). In case of bottleneck- evacuation, since the goal is far away from the exit, congestion is not heavy for the agents. For the hallway case, two-way has more chance to collide than four (4) way since we have somewhat small number of agents of thirty-two (32) (8 per way). This phenomenon naturally makes sense as the simulated trajectory will show complex turns when there is congestion, thus linear interpolation or simple median filtering is prone to fail.

[0064] Noise and Missing Information. A more realistic case was considered when both the noise and missing information exist in the trajectories. TABLE 2 shows the result on some exemplary combinations of the cases. A bottleneck-evacuation scenario was selected since the present solution found it most challenging. As shown in TABLE 2, the general trend is similar to TABLE 1, providing additional evidence for the robustness of the proposed solution. Visual comparisons between the original and reconstructed trajectories can be seen in FIG. 8.

Table 2

[0065] TABLE 2: RMS error (in meters) of reconstructed trajectory versus ground truth when both the noise and missing information exist. A scenario is shown where the present solution suffered compared to the baseline median filter in TABLE 1.

[0066] Some conventional techniques comprise methods for assessing crowd motion. The difference between these conventional techniques and that described above is that the conventional techniques teach learning a "collective nature" of the crowd, thus the tracklets are not necessarily generated from one person. This is based on the sociological theory that individual characteristics may vanish when the crowd behaves. Accordingly, the convention techniques are based on an assessment of a "crowd" or "group" rather than an individual. [0067] In the approach described herein, crowd behavior is modeled as a generative process that is a consequence of individual, group (social) and environmental constraints. In this way, it is possible to effectively model the group behavior and better capture the outliers of the model caused by individual characteristics. The present solution provides a more robust crowd model.

[0068] The present solution learns and updates in an online fashion. Thus, when deployed in a real surveillance environment, the present solution learns from video footages that are obtained in real time from cameras, but will process this information without sending all (huge) video information to the processing server. In order to accomplish this, the present solution utilizes a decentralized machine learning framework.

[0069] Exemplary System Architecture

[0070] As evident from the above discussion, the present solution implements a global trajectory estimation method for reconstructing a holistic view of the movement of individuals in a crowd, tracked using multiple noisy sensors with incomplete coverage. The present solution is robust to arbitrary amounts of noise and missing trajectory information, and is able to reconstruct complete trajectories that satisfy movement, collision, and energy constraints, while inheriting the behavioral characteristics of the original crowd.

[0071] Referring now to FIG. 11 , there is provided an illustration of an exemplary architecture for a system 1100. System 1110 can include more or less components than those shown in FIG. 11. As shown in FIG. 11 , system 1100 comprises a plurality of sensors 1102- 1106, a network 1110, a computing device 1112, and a database 1114. The plurality of sensors includes, but is not limited to, visual sensors (e.g., video cameras), aural sensors (e.g., microphones) and/or infrared sensors. Each type of listed sensor is well known in the art, and therefore is not described herein. Any known or to be known visual, aural and/or infrared sensor can be used herein without limitation. In all cases, the sensors generate raw sensor data indicting movements of individuals in a crowd.

[0072] The raw sensor data is then sent to the computing device 1112 over communication links 1108i, IIO82, IIO83 and network 1110. Any type of known or to be known communication technology can be used herein without limitation. Network 1110 can include, but is not limited to, the Internet and/or an Intranet. The computing device 1112 processes the raw sensor data to estimate a holistic view of the entire crowd. This is achieved by: minimizing artifacts due to sensor noise and tracking inaccuracies; and estimating missing information. The whole process is done in an end-to-end fashion with no or minimal human intervention. In this regard, the computing device 1112 implements all or a portion of the methods discussed herein. A more detailed illustration of the computing device 1112 is provided in FIG. 12.

[0073] Computing device 1112 may include more or less components than those shown in FIG. 12. However, the components shown are sufficient to disclose an illustrative embodiment implementing the present solution. The hardware architecture of FIG. 12 represents one embodiment of a representative computing device configured to facilitate human crowd trajectory estimation and tracking. As such, the computing device 1112 of FIG. 12 implements at least a portion of a method for human crowd trajectory estimation and tracking.

[0074] Some or all the components of the computing device 1112 can be implemented as hardware, software and/or a combination of hardware and software. The hardware includes, but is not limited to, one or more electronic circuits. The electronic circuits can include, but are not limited to, passive components (e.g., resistors and capacitors) and/or active components (e.g., amplifiers and/or microprocessors). The passive and/or active components can be adapted to, arranged to and/or programmed to perform one or more of the methodologies, procedures, or functions described herein.

[0075] As shown in FIG. 12, the computing device 1112 comprises a user interface 1202, a Central Processing Unit ("CPU") 1206, a system bus 1210, a memory 1212 connected to and accessible by other portions of computing device 1112 through system bus 1210, and hardware entities 1214 connected to system bus 1210. The user interface can include input devices (e.g., a keypad 1250) and output devices (e.g., speaker 1252, a display 1254, and/or light emitting diodes 1256), which facilitate user-software interactions for controlling operations of the computing device 1112. [0076] At least some of the hardware entities 1214 perform actions involving access to and use of memory 1212, which can be a Random Access Memory ("RAM"), a disk driver and/or a Compact Disc Read Only Memory ("CD-ROM"). Hardware entities 1214 can include a disk drive unit 1216 comprising a computer-readable storage medium 1218 on which is stored one or more sets of instructions 1220 (e.g., software code) configured to implement one or more of the methodologies, procedures, or functions described herein. The instructions 1220 can also reside, completely or at least partially, within the memory 1212 and/or within the CPU 1206 during execution thereof by the computing device 1112. The memory 1212 and the CPU 1206 also can constitute machine-readable media. The term "machine-readable media", as used here, refers to a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 1220. The term "machine- readable media", as used here, also refers to any medium that is capable of storing, encoding or carrying a set of instructions 1220 for execution by the computing device 1112 and that cause the computing device 1112 to perform any one or more of the methodologies of the present disclosure.

[0077] In some scenarios, the hardware entities 1214 include an electronic circuit (e.g., a processor and/or field programmable array) programmed for facilitating human crowd trajectory estimation and tracking. In this regard, it should be understood that the electronic circuit can access and run data 1226 stored in memory 1212 and a software application 1224 installed on the computing device 1112. Functions of the software application 1224 will become apparent as the discussion progresses.

[0078] Referring now to FIG. 13, there is provided a flow diagram of an exemplary method 1300 for human crowd trajectory estimation and tracking. Method 1300 begins with 1302 and continues with 1304 where at least one sensor (e.g., sensor 1102, 1104 and/or 1106 of FIG. 11) generates sensor data capturing information specifying crowd movement. The sensor can include, but is not limited to, a camera, a microphone and/or an infrared sensor. The sensor data comprises at least one crowd dataset D representing a scenario. The crowd dataset D includes data for a set of crowds D» = {X», E}, where X« is a set of trajectories and E represents an environmental configuration containing static obstacle information, n denotes the index of the trajectory set. E is a set of obstacle position and shapes, which is denoted as O* where k = 1 . . . K is the total number of obstacles.

[0079] The sensor data is then communicated to a computing device (e.g., computing device 1112 of FIG. 11) that is located remote from the sensor(s). Accordingly, the computing device receives the sensor data in 1306. At the computing device, the sensor data is processed to extract trajectory information X« therefrom, as shown by 1308. The trajectory information Xn includes, but is not limited to, information specifying movements of identified people in the crowd. In some scenarios, the trajectory information is defined as

^■Λ-η — i^n where i = 1. . . M and t = 0. . . T. M denotes the number of agents (people) and T denotes the number of timestamps for D«. represents the x, y location of the agent / at a specific time stamp t. Notably, the trajectory information X« is noisy (i.e., has missing or incomplete information). Methods for extracting trajectory information X« from sensor data are well known in the art, and therefore will not be described in detail herein. Any known or to be known method for extracting trajectory information from sensor data can be used here without limitation. For example, the trajectory information X« is extracted from the sensor data using adaptive particle filtering (e.g., KLD-sampling) and/or a human tracking technique (e.g., box particle filtering).

[0080] The extracted trajectory information for each person in the crowd X« is processed to compute a plurality of different energy terms, as shown by 1310. The energy terms include, but are not limited to, a first physical constraint energy term Fi characterizing a person's movement (e.g., velocity and acceleration), a second environmental constraint energy term characterizing collisions of individuals with objects, and a third social constraint energy term characterizing social interactions of the individuals. Each of the second and third constraint energy terms is referred to herein as Q. The energy terms are defined in accordance with the above-provided Mathematical Equation (1). [0081] The energy terms are then processed as shown by 1312-1316. 1312 involves analyzing the first physical constraint energy terms Fi for each person to determine whether the velocity and acceleration of the person's movements are consistent with normal velocity and acceleration ranges for a human within a crowd. For example, 1312 involves first operations to ensure that the agent i traveled a minimal distance (i.e., the agent traveled the shortest possible distance) and/or second operations to ensure that people are traveling the shortest distance as possible but not too fast (e.g., greater than 5 meters per second). The first operations involve comparing the agent's traveled distance with a first pre-defined threshold value (e.g., 5 meters). The second operations involve comparing the agent's traveled distance with the first pre-defined threshold value (e.g., 5 meters) and comparing the agent's velocity with a second pre-defined threshold value (e.g., 5 meters per second). If the comparison results indicate that the agent traveled too far (e.g., greater than 5 meters) and/or too fast (e.g., greater than 5 meters per second), then the velocity and/or acceleration data associated therewith is either (a) discarded or (b) modified so that the physical constraints are satisfied. Notably, the first and/or second predefined threshold value can be manually defined or learned based on historical crowd data. The present solution is not limited to the particulars of this example. As such, the determination of 1312 can alternatively or additionally be made in accordance with above-provided Mathematical Equations (1), (6), (7) and (8).

[0082] 1314 involves analyzing the second environmental constraint energy terms for each person to detect collisions of the person with obstacles (e.g., objects or persons) thereby indicating that a collision constraint is violated. This detection can be achieved by analyzing linear trajectory information for agent i to identify times when (s)he comes within a certain distance of an obstacle (e.g., another person or an object). In some scenarios, this detection is made in accordance with the above-provided Mathematical Equation (9). Mathematical Equation (9) ensures that agent i and agent j always keep at least a certain distance from each other while they are traveling from a first location to a second location. If a collision of agent i and agent j is detected, then the corresponding data is either (a) discarded or (b) modified so that arbitrary environmental pairwise constraints are satisfied (i.e., so that the data indicates that a distance between agent i and an obstacle always exceeds a pre-defined threshold value (e.g., 2 feet)).

[0083] 1316 involves analyzing the third social constraint energy terms to detect social relationships between individuals in the crowd (e.g., individuals that are following one another, and/or talking to one another, etc.). This analysis can be achieved by: obtaining a trajectory model defining group behaviors (which was generated based on collecting real world data of human interactions); and identifying groups of individuals in the crowd based on movement patterns thereof that are the same as or similar to group behavior patterns defined by the trajectory model. In some scenarios, a movement pattern is similar to a group behavior patterns defined by the trajectory model when a certain percentage (e.g., 60-99%) of the two sets of data match each other. The data associated with behaviors that are not the same as or similar to those defined by the trajectory model are either (a) discarded or (b) modified so that social constraints are satisfied.

[0084] Upon completing 1312-1316, method 1300 continues with 1318. In 1318, the extracted trajectory information X« is altered based on the results of 1312-1316 to generate optimized trajectory information ^l* ^» for each person in the crowd by removing noise from the extracted trajectory information (e.g., by minimizing the first, second and third energies) and adding estimated missing data to the extracted trajectory information X«. Methods for removing noise from trajectory information and adding missing data to trajectory information are well known in the art, and therefore will not be described in detail herein. Any known or to be known method for removing noise from trajectory information and adding missing data to trajectory information can be used herein without limitation.

[0085] In some scenarios, an ADMM based technique is used in 1318. ADMM generally involves solving convex optimization problems by breaking them into smaller pieces. ADMM is used to solve the following Mathematical Equations (10).

where ' ^ * are given and fixed, and the trajectory set index n was omitted for notational brevity. So in this ADMM scenario, Xi is obtained after 1310 for all i = \. . . N, where N is the total number of agents and environment Z is known. After defining the energy terms in 1312- 1316, the optimal trajectory Γ,- for corresponding Xi is found using Mathematical Equation (10). The present solution is not limited to the ADMM scenarios. In other scenarios, other convex optimization algorithms are used alternative to the ADMM algorithm.

[0086] Upon completing 1318, 1320 is performed where a fourth energy term (e.g., derived based on historical crowd movement data) is used to ensure that the optimized trajectory information specifies individuals' movements that are consistent with (or does not deviate by a certain amount) from pre-defined reference human crowd movements. 1320 is performed to ensure that the optimized trajectory information is generally as similar as possible to real world crowd movement. Accordingly, historical information of real human crowd data is used to modify the optimized trajectory information to compensate for unrealistic crowd movements. In some scenarios, this is achieved by: computing a squared Euclidean distance between each optimized trajectory point and a reference real world trajectory point; comparing the squared Euclidean distance to a threshold value; and modifying the optimized trajectory point so that it becomes the same as or similar to the reference real world trajectory point. Subsequently, 1322 is performed where method 1300 ends or other processing is performed.

[0087] Although the invention has been illustrated and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In addition, while a particular feature of the invention may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as maybe desired and advantageous for any given or particular application. Thus, the breadth and scope of the present invention should not be limited by any of the above described embodiments. Rather, the scope of the invention should be defined in accordance with the following claims and their equivalents.

Claims

CLAIMS What is claimed is:

1. A method for human crowd trajectory estimation and tracking, comprising:

processing, by a computing device, sensor data for captured crowd movements to extract trajectory information therefrom;

transforming the extracted trajectory information into optimized trajectory information by removing noise from the extracted trajectory information and adding estimated missing data to the extracted trajectory information; and

selectively modifying the optimized trajectory information so that crowd movements defined thereby are consistent with reference crowd movements.

2. The method according to claim 1 , wherein the sensor data is generated by a camera, a microphone or an infrared sensor.

3. The method according to claim 1 , further comprising processing the extracted trajectory information for each person in the crowd to compute a plurality of different energy terms.

4. The method according to claim 3, wherein the plurality of energy terms comprises a physical constraint energy term characterizing people's movements, an environmental constraint energy term characterizing collisions of individuals with obstacles, and a social constraint energy term characterizing social interactions of the individuals.

5. The method according to claim 4, further comprising:

performing a first analysis the physical constraint energy terms for each person in the crowd to determine whether the velocity and acceleration of the person's movements are consistent with reference velocity and acceleration ranges for a human within a crowd;

performing a second analysis the environmental constraint energy terms for each person in the crowd to detect collisions of the person with obstacles; and performing a third analysis the social constraint energy terms for each person in the crowd to detect social relationships between individuals in the crowd.

6. The method according to claim 5, wherein the extracted trajectory information is transformed into the optimized trajectory information based on results of the first, second and third analysis.

7. The method according to claim 6, wherein the extracted trajectory information is transformed into the optimized trajectory information using an Altering Direction Method of Multipliers ("ADMM") based technique.

8. The method according to claim 1 , wherein the reference crowd movements are determined based on historical sensor data specifying historical real human crowd movements.

9. The method according to claim 1 , wherein the optimized trajectory information is selectively modified by: computing a squared Euclidean distance between each optimized trajectory point and a reference real world trajectory point; comparing the squared Euclidean distance to a threshold value; and modifying the optimized trajectory point so that the optimized trajectory point becomes closer to the reference real world trajectory point.

10. A system, comprising:

a processor;

a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for inventory management, wherein the programming instructions comprise instructions to:

process sensor data for captured crowd movements to extract trajectory information therefrom; transform the extracted trajectory information into optimized trajectory information by removing noise from the extracted trajectory information and adding estimated missing data to the extracted trajectory information; and

selectively modify the optimized trajectory information so that crowd movements defined thereby are consistent with reference crowd movements.

11. The system according to claim 10, wherein the sensor data is generated by a camera, a microphone or an infrared sensor.

12. The system according to claim 10, wherein the programming instructions further comprise instructions to process the extracted trajectory information for each person in the crowd to compute a plurality of different energy terms.

13. The system according to claim 12, wherein the plurality of energy terms comprises a physical constraint energy term characterizing people's movements, an environmental constraint energy term characterizing collisions of individuals with obstacles, and a social constraint energy term characterizing social interactions of the individuals.

14. The system according to claim 13, wherein the programming instructions further comprise instructions to:

perform a first analysis the physical constraint energy terms for each person in the crowd to determine whether the velocity and acceleration of the person's movements are consistent with reference velocity and acceleration ranges for a human within a crowd;

perform a second analysis the environmental constraint energy terms for each person in the crowd to detect collisions of the person with obstacles; and

perform a third analysis the social constraint energy terms for each person in the crowd to detect social relationships between individuals in the crowd.

15. The system according to claim 14, wherein the extracted trajectory information is transformed into the optimized trajectory information based on results of the first, second and third analysis.

16. The system according to claim 15, wherein the extracted trajectory information is transformed into the optimized trajectory information using an Altering Direction Method of Multipliers ("ADMM") based technique.

17. The system according to claim 10, wherein the reference crowd movements are determined based on historical sensor data specifying historical real human crowd movements.

18. The system according to claim 10, wherein the optimized trajectory information is selectively modified by: computing a squared Euclidean distance between each optimized trajectory point and a reference real world trajectory point; comparing the squared Euclidean distance to a threshold value; and modifying the optimized trajectory point so that the optimized trajectory point becomes closer to the reference real world trajectory point.