CN117436937A

CN117436937A - Path prediction method and system considering pedestrian portrait

Info

Publication number: CN117436937A
Application number: CN202311764365.7A
Authority: CN
Inventors: 郑德高; 陈崇雨
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-01-23
Anticipated expiration: 2043-12-21
Also published as: CN117436937B

Abstract

The invention relates to the technical field of track prediction, in particular to a method and a system for predicting a path by considering pedestrian images, wherein the method comprises the following steps: based on video monitoring information of a commercial area, extracting basic characteristics of pedestrians, and constructing a pedestrian characteristic dictionary; updating the pedestrian feature dictionary based on the number of pedestrians and/or the number of pedestrian attribute features; for each pedestrian, deducing a destination to which each pedestrian is most likely to go based on the commercial area actual map and the updated pedestrian feature dictionary; the current speed of the pedestrian, the space distance between the pedestrian and other pedestrians, the distance between the pedestrian and the map obstacle and the destination to which the pedestrian is most likely to go are input into a social force model, and track data of each pedestrian in a period of time in the future are output based on the social force model. The invention can improve the accuracy and the interpretability of pedestrian track prediction.

Description

Path prediction method and system considering pedestrian portrait

Technical Field

The invention relates to the technical field of track prediction, in particular to a method and a system for predicting a path by considering pedestrian images.

Background

Pedestrian trajectory prediction is of great importance in business decisions. In modern urban life, the behavior and movement patterns of people have a profound impact on the planning and decision making of a business environment. By accurately predicting the trajectory and behavior of pedestrians, an enterprise can better understand and cope with the needs, behaviors and preferences of people, thereby optimizing business decisions and providing better services. This predictive capability provides valuable information to the enterprise, which can be used to formulate more accurate business strategies, optimize store layout, improve marketing strategies, and enhance customer experience. For example, in the retail industry, accurately predicting customer behavior and shopping paths may help merchants optimize merchandise displays, promotional policies, and inventory management, improving sales and customer satisfaction.

At present, the pedestrian track prediction mainly comprises two types of methods, namely a data driving method and a behavior driving method, the moving target track prediction method based on the data driving is mainly used for mining moving target behavior characteristics hidden behind data through massive historical track data, then fusion matching is carried out on the moving target behavior characteristics and the current position data, and further the movement trend of the target is predicted. Data-driven common include Kalman filters, differential autoregressive moving averages, hidden Markov models, gaussian mixture models, bayesian networks, deep learning, and the like.

The prediction accuracy of the Kalman Filter (KF) is severely dependent on the spacing and errors of the trajectory data. When the interval of the track data is large or the error is large, the prediction effect of the KF model can be obviously affected. Differential autoregressive moving average (ARIMA) has limitations in processing nonlinear trajectory data, and cannot accurately capture the trend of variation of complex nonlinear trajectories due to the characteristics of linear combinations thereof. Hidden Markov Model (HMM) parameters are complex to set, requiring adaptive selection or observation state parameters based on a hybrid HMM. Gaussian Mixture Models (GMMs) are limited in their ability to model complex data as the trajectory data complexity increases, and may lose applicability. The Bayesian network prediction process involves subjective information of decision maker, and the prediction result is affected by factors such as prior probability. Proper parameter selection is important when using a deep learning model for track prediction. If the parameters are not properly selected, the gradient can be disappeared or exploded, and the accuracy of the prediction result is affected. Furthermore, deep learning models are typically black box models, which make it difficult to interpret decision processes and reasoning logic inside the model. This may present certain limitations in areas where transparency and interpretability are required.

The behavior-driven moving target track prediction method can predict track expansion in a future time period according to relevant motion characteristics of a moving target, and generally comprises a dynamic model and intention recognition.

The social force model is a common dynamics model, is based on Newton dynamics, reflects different motivations and influences of targets by expression of each force, and can truly describe the movement condition of individuals in a group. The social force model expresses the motion process of pedestrians in a complex environment by using a mathematical calculation formula, and the prediction accuracy can be improved by adopting the social force model in the track prediction process. The intention recognition is usually a module for adding object intention recognition before track prediction, wherein an HMM and a support vector machine SVM are commonly used models, and the addition of the intention recognition module enables track prediction accuracy to be remarkably improved in some motor vehicle track prediction works.

In general, there are a number of problems with pedestrian path prediction, whether data-driven or behavior-driven based approaches.

Therefore, how to improve the accuracy and interpretation of pedestrian trajectory prediction is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a method and a system for predicting a path in consideration of a pedestrian image, which can improve the accuracy and the interpretability of pedestrian trajectory prediction.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, the present invention provides a path prediction method considering a pedestrian image, comprising the steps of:

based on video monitoring information of a commercial area, extracting basic characteristics of pedestrians, and constructing a pedestrian characteristic dictionary;

updating the pedestrian feature dictionary based on the number of pedestrians and/or the number of pedestrian attribute features;

for each pedestrian, deducing a destination to which each pedestrian is most likely to go based on a commercial area actual map and the updated pedestrian characteristic dictionary;

the method comprises the steps of inputting the current speed of a pedestrian, the space distance between the pedestrian and other pedestrians, the distance between the pedestrian and a map obstacle and the destination to which the pedestrian is most likely to go into a social force model, and outputting track data of each pedestrian in a future period of time based on the social force model.

Further, the video monitoring information is video information in a static mode or video information in a dynamic mode;

in a static mode, video signals of the same period are collected through a plurality of cameras in a business area and used as the video monitoring information;

in the dynamic mode, video signals are periodically collected through a plurality of cameras in a business area and used as the video monitoring information.

Further, the constructing a pedestrian feature dictionary includes:

image segmentation is carried out on each pedestrian in the video monitoring information;

performing feature recognition on each segmented pedestrian image;

based on the identified features, the pedestrians are coded by taking the key as an identifier of the pedestrian and the value as an attribute feature of the pedestrian, and a pedestrian feature dictionary is constructed.

Further, in a dynamic mode, periodically updating the pedestrian feature dictionary;

when the number of pedestrians and/or the number of pedestrian attribute features is less than or equal to 10, the data structure of the pedestrian feature dictionary comprises a pedestrian identifier and a plurality of pedestrian attribute features subordinate to the pedestrian identifier;

when the number of pedestrians and/or the number of pedestrian attribute features is greater than 10, the data structure of the pedestrian feature dictionary is a multi-level dictionary structure.

Further, deducing a destination to which the pedestrian is most likely to go based on the updated pedestrian feature dictionary, including:

marking the possible positions of all pedestrians in the actual map of the commercial district as potential destinations;

expanding virtual edges in an actual map, and constructing an expanded map taking three factors of spatial position characteristics, potential destination attractive force characteristics and attribute characteristics of pedestrians into consideration;

for any pedestrian, calculating the shortest path of the pedestrian in the expansion map, and tracing back the actual potential destination through which the shortest path passes as the destination to which the pedestrian is most likely to go.

Further, the expanding step of the actual map comprises the following steps:

a virtual point is constructed outside the actual map, all potential destination points in the actual map are connected with the virtual point to form a virtual edge, and the distance on the virtual edge is the characteristic distance of the potential destination attraction;

aiming at any pedestrian, additionally expanding a plurality of sub-edges and a plurality of sub-nodes on a virtual edge according to the attribute characteristics of the pedestrian, wherein the distance on each sub-edge represents the attribute characteristic quantity of the pedestrian, and completing the expansion of an actual map; the expansion map comprises three factors of spatial position characteristics, potential destination attractive force characteristics and pedestrian self attributes of an actual map.

Further, in the expansion map, the problem of judging a plurality of potential destinations is converted into a problem of shortest path calculation, the shortest path from the current position of the pedestrian to the virtual point is calculated by calling a shortest path solving algorithm, and the actual potential destination through which the shortest path passes is traced.

Further, the pedestrian attribute feature includes: one or more of the current position of the pedestrian, the walking speed, the height range of the pedestrian, the age interval, the clothing type, whether the pedestrian has a combination or not and the combination type; the combination types of pedestrians comprise: parents and children, lovers, friends, colleagues or friends of relatives.

In a second aspect, the present invention provides a path prediction system considering a pedestrian image, comprising:

the video monitoring module is used for extracting basic characteristics of pedestrians based on video monitoring information of the commercial area and constructing a pedestrian characteristic dictionary;

the dictionary updating module is used for updating the pedestrian characteristic dictionary based on the number of pedestrians and/or the number of pedestrian attribute characteristics;

a destination inference module, configured to infer, for each pedestrian, a destination to which each pedestrian is most likely to travel based on the updated pedestrian feature dictionary;

the track prediction module is used for inputting the current speed of the pedestrians, the space distance between the pedestrians and other pedestrians, the distance between the pedestrians and map obstacles and the destination to which the pedestrians are most likely to go into the social force model, and outputting track data of each pedestrian in a period of time in the future based on the social force model.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention has the path prediction capability with high accuracy: by integrating a plurality of information sources, including pedestrian images, multi-modal information in a map and a social force model, the method can effectively predict the action track of personnel. This way of integrating multimodal information can provide more comprehensive, accurate input data, thereby increasing the accuracy and reliability of path prediction.

2. Taking into account individual characteristics and environmental factors: the invention combines the personnel portrait with the multi-mode information in the map, and can fully consider the influence of individual characteristics and environmental factors on track prediction. By comprehensively considering factors such as the behavior preference, the movement mode, the geographic position and the like of personnel, the rule and trend of the individual behavior can be more accurately captured, and the accuracy of path prediction is improved.

3. Social force models provide more realistic predictions: the invention introduces a social force model, which is a model taking the mutual influence relationship among people into consideration. The social force model can simulate interaction among people, group behaviors and influence of social factors, so that the track of the people can be predicted more truly. By considering the social force model, the prediction system can better capture interaction behaviors and group effects among people, and accuracy and credibility of path prediction are improved.

4. Better interpretability: the invention provides high-accuracy path prediction and simultaneously pays attention to maintaining better interpretability. The basis and reasons for the predicted outcome may be interpreted to the user to enable the business decision maker to understand and trust the predicted outcome. The benefit of interpretability is that it increases the credibility and acceptability of the system, enabling business decision makers to better utilize the predicted outcomes to support decision making.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of path prediction for pedestrian images in accordance with the present invention;

FIG. 2 is a schematic view of an extended map provided by the present invention;

FIG. 3 is a schematic diagram showing the expansion of an actual map in an example of a pedestrian planning to buy clothes in a clothing store;

FIG. 4 is a schematic diagram of a system for predicting a pedestrian representation-considered path according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the embodiment of the invention discloses a path prediction method considering pedestrian images, which comprises the following steps:

s1, extracting basic characteristics of pedestrians based on video monitoring information of commercial areas, and constructing a pedestrian characteristic dictionary;

s2, updating the pedestrian feature dictionary based on the number of pedestrians and/or the number of pedestrian attribute features;

s3, for each pedestrian, deducing the most likely destination to which each pedestrian goes based on the commercial area actual map and the updated pedestrian characteristic dictionary;

s4, inputting the current speed of the pedestrian, the space distance between the pedestrian and other pedestrians, the distance between the pedestrian and the map obstacle and the destination to which the pedestrian is most likely to go into the social force model, and outputting track data of each pedestrian in a future period (in the order of minutes) based on the social force model.

In a specific embodiment, the present invention further describes the steps, specifically including:

s1, video monitoring information acquired by a commercial area comprises two modes, namely video information in a static mode or video information in a dynamic mode;

in a static mode, video signals of the same period are collected through a plurality of cameras in a business area and used as video monitoring information; predicting the track of the pedestrian motion by using the video signal of the static mode only;

in the dynamic mode, video signals are periodically collected through a plurality of cameras in a business area and used as video monitoring information, and the dynamic video signals are used for predicting the motion track of pedestrians.

After the video monitoring information is collected, a pedestrian characteristic dictionary needs to be constructed, and the specific steps of constructing the pedestrian characteristic dictionary comprise:

s11, dividing images of pedestrians in the video monitoring information; for the image segmentation algorithm, a GrabCut algorithm can be used, the image is segmented into a foreground and a background by an iterative optimization mode, and various video image segmentation algorithms in deep learning can be adopted. Such as FCN (Fully Convolutional Network) or U-Net, mask R-CNN, etc. Or video segmentation algorithms based on image sequences, such as optical flow segmentation, cluster segmentation, etc. These algorithms have different characteristics and applicable scenarios in video image segmentation tasks, and the selection of the appropriate algorithm depends on the specific application requirements and data characteristics.

S12, performing feature recognition on each segmented pedestrian image; for feature recognition, a corresponding sensing algorithm is required to be selected according to the feature type to be recognized.

S13, based on the identified features, using keys as identifiers of pedestrians, using values as attribute features of the pedestrians, and encoding the pedestrians to construct a pedestrian feature dictionary.

When the pedestrian feature dictionary is constructed, in a static mode, the initial state of the pedestrian feature dictionary is empty, and elements are gradually added into the pedestrian feature dictionary along with segmentation and feature extraction of images. For the dynamic mode, since the video monitoring information is periodically input, the pedestrian dictionary is empty as in the static mode at the beginning, and the process of adding elements in the dictionary in the later period becomes the process of updating the dictionary, namely adding the elements in the dictionary and possibly modifying the attribute values of the elements in the dictionary. Thus, only in dynamic mode is there a periodic update of the pedestrian feature dictionary.

S2, updating a pedestrian feature dictionary:

in a dynamic mode, periodically updating the pedestrian feature dictionary;

when the number of pedestrians and/or the number of pedestrian attribute features are small, the general number is within 10, and the data structure of the pedestrian feature dictionary comprises a pedestrian identifier and a plurality of pedestrian attribute features subordinate to the pedestrian identifier; the method can be concretely expressed as follows: { ' person1': att1, att2, … ], ' person2': att1, att2, … ] }, wherein { person1', ' person2', … } represent pedestrian identification, [ att1, att2, … ] represent attribute characteristics of pedestrians;

when the number of pedestrians and/or the attribute features of pedestrians are large, the number is more than 10, and the data structure of the pedestrian feature dictionary is a multi-level dictionary structure.

Wherein the pedestrian attribute characteristics include: one or more of the current position of the pedestrian, the walking speed, the height range of the pedestrian, the age interval, the clothing type, whether the pedestrian has a combination or not and the combination type; the combination types of pedestrians comprise: parents and children, lovers, friends, colleagues or friends of relatives.

S3, specifically, the destination deducing step comprises the following steps:

s31, marking possible stay positions of all pedestrians in an actual map of a commercial district as potential destinations; such as the cross in fig. 2. The potential destination feature may be provided as a sign in a commercial location, such as a shop, a promotion, or the like.

S32, expanding virtual edges in the actual map, and constructing an expanded map taking three factors of spatial position characteristics, potential destination attractive force characteristics and attribute characteristics of pedestrians into consideration; the expanding step of the actual map comprises the following steps:

constructing a virtual point (such as the lowest point in fig. 2, wherein the virtual point is randomly selected, the position of the node has no practical meaning and does not participate in path planning) outside the actual map, connecting all potential destination points in the actual map with the virtual point to form a virtual edge, and the distance of the virtual edge is the characteristic distance of the attraction of the potential destination; such as the attractiveness of the destination, the attractiveness value being greater when the destination has a store or a promotional campaign by hand than for a destination without these features.

When the pedestrian feature is considered, the connection edge between the actual destination and the virtual point can be further expanded. Aiming at any pedestrian, additionally expanding a plurality of sub-edges and a plurality of sub-nodes on a virtual edge according to the attribute characteristics of the pedestrian, wherein the distance on each sub-edge represents the attribute characteristic quantity of the pedestrian, and completing the expansion of an actual map; as shown in fig. 2, when two pedestrian attribute characteristics are considered, one virtual edge additionally expands two child edges and two child nodes.The two expanded child nodes have no practical meaning, and the distances (weights) on the two child edges respectively represent the characteristic quantity of the pedestrian attribute. As W in FIG. 2 _1,n And W is _2,n The original virtual edge becomes W _3,n . The map expanded up to this point contains the spatial location features (actual geographic distance from point to point) of the actual map, the potential destination attraction feature distance (W _3,n ) Pedestrian self attribute feature distance (W _1,n And W is _2,n ) A variety of distances.

In this way, the expansion map contains the spatial location feature (actual geographic distance from point to point), the potential destination attraction feature (W _3,n ) And attribute of pedestrian itself (W _1,n And W is _2,n ) Three types of factors.

When the feature distance is set, it is set to 10 times the geographical distance. For example, the geographical distance may be in the range of 1000m or less, and the distance representing various features may be set between 10000-100000.

S33, aiming at any pedestrian, converting the problems judged by the multiple potential destinations into a shortest path calculation problem in the expansion map, and calling a shortest path solving algorithm to calculate the shortest path from the current position of the pedestrian to the virtual point. Specifically, an A-STAR algorithm can be adopted, so that the calculation path is three factors of comprehensively considered spatial geographic factors, destination characteristic information in a map and attribute characteristics of pedestrians. After the shortest path is calculated, the destination which the pedestrian most likely goes to can be obtained only by tracing back the actual potential destination through which the shortest path passes.

S4, track prediction:

the calculation of the trajectory can be performed by using an existing social force model, and the input of the social force model comprises the current speed of the pedestrian, the spatial distance between the pedestrian and other pedestrians, the distance between the pedestrian and the map obstacle and the destination of the pedestrian. The output is the actual trajectory of the pedestrian for a period of time in the future.

The principle of expanding the actual map in the embodiment of the invention is as follows: the shortest path between two points has optimality, i.e. any sub-segment is also the shortest path, in a weighted graph, if the shortest path between two specific nodes is found, then any sub-segment on this shortest path is also the shortest path connecting the same start point and end point.

Assuming that a shortest path from a start point to an end point has been found and that this shortest path has passed through a series of nodes, arranged in the order of the paths as a, B, C, D, where a is the start point and D is the end point. Now, a new problem is encountered that needs to solve the shortest path from a to C, and only the part from a to C in the shortest path from a to D is extracted.

The expansion of the map will be further described with a specific example.

As shown in fig. 3, it is known that a pedestrian plans to buy clothes at a clothing store, three clothing stores (two flat stores and one high-end store) are located within 1km of the pedestrian, and the pedestrian's consumption habit is more focused on cost performance. It is now necessary to infer the store to which the pedestrian is most likely to travel and the route to the store.

The conventional algorithm flow is to firstly check nearby shops, find 3 shops, then screen out 2 flat shops according to consumption habits, finally calculate the shortest road strength of the two shops respectively, select one with a shorter distance, and go to according to the calculated road strength.

The method provided by the invention is as follows: a series of virtual points are first constructed, the store is connected to the virtual points, and finally the virtual points are connected to a virtual endpoint. Store attributes and pedestrian attributes are configured on the link. For example, clothing stores have their own attributes, brand value, i.e. high-end clothing stores have high brand value (20000), and flat clothing stores have slightly lower brand value (10000). The consumption habit of the pedestrian is that the price performance is liked, and the smaller the consumption habit value is, the stronger the intention of the pedestrian to go. The pedestrian consumption habit attribute values to the three clothing stores are configured as 10000,20000,10000, respectively. (the location of the virtual points has no practical meaning, only the weights of the links between the points participate in the path computation). Therefore, the algorithm calculation flow becomes the shortest path from the pedestrian position to the virtual destination, and the result of the path calculation is necessarily the lowest path in fig. 3 (the comprehensive geographic distance is the shortest, the brand value of the store, and the result of the consumer habit of the pedestrian), and the output result only needs to be given out from the pedestrian position to the part between the lowest flat clothing stores, namely, the pedestrian goes to a flat clothing store with the distance of 450 m.

As shown in FIG. 4, an embodiment of the present invention provides a path prediction system considering a pedestrian image, comprising:

the dictionary updating module is used for updating the pedestrian characteristic dictionary based on the number of pedestrians and/or the number of attribute characteristics of pedestrians;

a destination inference module for inferring, for each pedestrian, a destination to which each pedestrian is most likely to travel based on the updated pedestrian feature dictionary;

the track prediction module is used for inputting the current speed of the pedestrians, the space distance between the pedestrians and other pedestrians, the distance between the pedestrians and map obstacles and the destination to which the pedestrians are most likely to go into the social force model, and outputting track data of each pedestrian in a future period of time based on the social force model.

The data transmitted to the dictionary updating module by the video monitoring module is a pedestrian characteristic dictionary. In static mode, the data transfer process is only once. In the dynamic mode, the data transfer process is performed once per video signal input period.

The data transfer process from the dictionary updating module to the video monitoring module only exists in the dynamic mode, because only in the dynamic mode, the pedestrian feature dictionary is periodically updated.

The dictionary updating module transmits the pedestrian feature dictionary to the destination inference module for the basic data of the destination calculation.

The destination deducing module transmits the calculated destination to the track predicting module in dictionary mode, the key of the dictionary is the code of the pedestrian, and the value is the most probable destination of each pedestrian. In the static mode, the data transfer process is only once, and in the dynamic mode, the result of pedestrian destination inference is transferred once every period.

The track prediction module outputs track data of pedestrians. The output data is in the form of a dictionary, the code number of the pedestrian is given when the key of the dictionary is given, and the value of the dictionary is a list of coordinate positions. Representing the trajectory of each pedestrian, respectively.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of path prediction in consideration of pedestrian portraits, comprising the steps of:

2. The pedestrian image-considered path prediction method according to claim 1, wherein the video monitoring information is video information in a static mode or video information in a dynamic mode;

3. The pedestrian representation-considered path prediction method according to claim 1, wherein the constructing a pedestrian feature dictionary includes:

performing feature recognition on each segmented pedestrian image;

4. The pedestrian representation-considered path prediction method according to claim 2, wherein the pedestrian feature dictionary is periodically updated in a dynamic mode;

5. The pedestrian representation-considered path prediction method according to claim 1, wherein the estimating, for each pedestrian, a destination to which the pedestrian is most likely to go based on the updated pedestrian feature dictionary includes:

6. The pedestrian representation-considered path prediction method of claim 5, wherein the expanding of the actual map includes:

7. The pedestrian representation-considered path prediction method of claim 5 wherein in the extended map, the problem of multiple potential destination decisions is converted into a shortest path calculation problem, and a shortest path solving algorithm is invoked to calculate the shortest path from the pedestrian's current location to the virtual point, and the actual potential destination through which the shortest path passes is traced back.

8. The pedestrian representation-considered path prediction method of claim 1 wherein the pedestrian attribute feature comprises: one or more of the current position of the pedestrian, the walking speed, the height range of the pedestrian, the age interval, the clothing type, whether the pedestrian has a combination or not and the combination type; the combination types of pedestrians comprise: parents and children, lovers, friends, colleagues or friends of relatives.

9. A path prediction system that accounts for pedestrian portraits, comprising: