CN112561188A

CN112561188A - People flow prediction method and device

Info

Publication number: CN112561188A
Application number: CN202011531911.9A
Authority: CN
Inventors: 林凡; 张秋镇; 黄富铿; 周芳华
Original assignee: GCI Science and Technology Co Ltd
Current assignee: GCI Science and Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-26

Abstract

The invention discloses a people flow prediction method and a device, wherein the method comprises the following steps: acquiring training data; the training data consists of a plurality of groups of data, wherein each group of data comprises time, day type, indoor temperature, outdoor temperature and pedestrian volume; performing cluster training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, and selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering model; establishing K long-short term memory network models, training the kth long-short term memory network model by using the clustered kth class data set respectively, wherein K belongs to [1, K ], and obtaining K trained long-short term memory network models; inputting data to be predicted into a trained self-adaptive unsupervised clustering model to obtain a category corresponding to the data to be predicted; and inputting the data to be predicted into the long-term and short-term memory network model corresponding to the category to obtain the predicted pedestrian volume. The invention does not need to manually set the K value and has high accuracy of people flow prediction.

Description

People flow prediction method and device

Technical Field

The invention belongs to the technical field of people flow prediction, and particularly relates to a people flow prediction method and a people flow prediction device.

Background

Through the people flow prediction, the people flow quantity in the current area can be accurately mastered, various adverse events such as treading, stealing and the like can be avoided, and meanwhile, the queuing waiting time of personnel can be reasonably reduced and the use pressure of public facilities can be relieved.

At present, a relatively universal method is people flow prediction based on unsupervised learning algorithms such as K-MEANS and the like, the principle of the method is relatively simple, but when a K value is set, the algorithm is adjusted manually to ensure the accuracy of the algorithm under different data, and the method is only suitable for places with relatively fixed people flow.

Disclosure of Invention

The invention provides a people flow prediction method and a people flow prediction device, which aim to solve the problem that unsupervised learning algorithms such as K-MEANS and the like need to manually set a K value.

In a first aspect, an embodiment of the present invention provides a people flow prediction method, including:

acquiring training data; the training data consists of a plurality of groups of data, and each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume;

performing cluster training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, and selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering algorithm to obtain a trained self-adaptive unsupervised clustering model;

establishing K long-short term memory network models, training the kth long-short term memory network model by using the clustered kth class data set respectively, wherein K belongs to [1, K ], and obtaining K trained long-short term memory network models;

inputting data to be predicted into a trained self-adaptive unsupervised clustering model to obtain a category corresponding to the data to be predicted; the data to be predicted comprises time to be predicted, a day type corresponding to the time to be predicted, indoor temperature and outdoor temperature;

and inputting the data to be predicted into the long-term and short-term memory network model corresponding to the category for prediction to obtain the predicted pedestrian volume.

Preferably, if each set of data is in the form of Z ═ Z (D, T', Time, scope), then the training data is in the form of [ Z ═ Z₁Z₂…Z_n](ii) a Wherein D is a day type, T is an indoor temperature, T' is an outdoor temperature, Time is Time, and peoples are human traffic;

performing cluster training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, selecting the K value with the minimum DB index as the K value of the adaptive unsupervised clustering algorithm, and obtaining a trained adaptive unsupervised clustering model, wherein the method specifically comprises the following steps:

step 11: from the training data [ Z₁ Z₂…Z_n]In randomly selecting K points u_i，i∈[1,K]As a clustering center, K has an initial value of K ═ 1;

step 12: finding the cluster center u_iAll points Z within radius h_m，Z_m∈{Z₁,Z₂...,Z_nAll Z_mPoints of (A) are denoted as set S_i；

Step 13: calculating each of the cluster centers u_iTo the set S_iOffset M of (a);

step 14: clustering the center u_iMove by an offset M;

step 15: repeating the steps 12-14 until the cluster center u_iConvergence no longer moves; wherein, in the convergence process, the [ Z ] is₁ Z₂…Z_n]The n points are clustered into the center u according to different clusters_iFrequency of access, u with highest frequency of access_iI.e. the final cluster center to which it belongs, the centerPoint u_iAll points Z belong to one class;

step 16: calculating a DB index at the K value;

and step 17: updating the K value by K + 1;

step 18: the step 11-17 is circulated until the K value reaches a preset maximum value;

step 19: and selecting the value K with the minimum BD index as the K value of the adaptive unsupervised clustering algorithm to obtain the trained adaptive unsupervised clustering model.

Preferably, said calculating each of said cluster centers u_iTo the set S_iSpecifically, the offset M of (a) includes:

according to the formula

Calculating each of the cluster centers u_iTo the set S_iOffset M of (3), wherein p is S_iThe number of interior points.

Preferably, the calculating the DB index under the K value specifically includes:

according to the formula

Calculating BD index I at the K value_DB(ii) a Wherein, C_oAnd C_uThe distances from the sample o and the sample u to the corresponding cluster center are respectively expressed in the formula; f_i,jRepresenting the Euclidean distance between the center of cluster i to cluster j, i ∈ [1, K ∈]，j∈[1,K]。

Preferably, the training of the kth long-short term memory network model by using the clustered kth class data set specifically includes:

determining parameters of a kth long-short term memory network model, the parameters comprising: the number of hidden layer neurons, the number of output layer neurons, Dropout, batch _ size, and epochs;

training a kth long-short term memory network model by adopting the clustered kth class data set according to the parameters; wherein, the loss function adopts an average absolute error loss function, and the optimizer adopts adaptive moment estimation.

Preferably, the day type is weekday, ordinary weekend, ordinary holiday or spring festival.

In a second aspect, an embodiment of the present invention provides a people flow rate obtaining apparatus, including:

an acquisition unit configured to acquire training data; the training data consists of a plurality of groups of data, and each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume;

the K value calculating unit is used for carrying out clustering training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, and selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering algorithm to obtain a trained self-adaptive unsupervised clustering model;

the long-short term memory network model training unit is used for establishing K long-short term memory network models, training the kth long-short term memory network model by using the clustered kth class data set respectively, and obtaining K trained long-short term memory network models, wherein K belongs to [1, K ];

the class acquisition unit is used for inputting data to be predicted into a trained self-adaptive unsupervised clustering model to acquire a class corresponding to the data to be predicted; the data to be predicted comprises time to be predicted, a day type corresponding to the time to be predicted, indoor temperature and outdoor temperature;

and the people flow prediction unit is used for inputting the data to be predicted into the long-term and short-term memory network model corresponding to the category for prediction to obtain the predicted people flow.

step 14: clustering the center u_iMove by an offset M;

step 15: repeating the steps 12-14 until the cluster center u_iConvergence no longer moves; wherein, in the convergence process, the [ Z ] is₁ Z₂…Z_n]The n points are clustered into the center u according to different clusters_iFrequency of access, u with highest frequency of access_iI.e. the final cluster center to which it belongs, the central point u_iAll points Z belong to one class;

step 16: calculating a DB index at the K value;

and step 17: updating the K value by K + 1;

according to the formula

Compared with the prior art, the embodiment of the invention obtains the training data; the training data consists of a plurality of groups of data, and each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume; performing cluster training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, selecting the K value with the minimum DB index as the K value of the adaptive unsupervised clustering algorithm to obtain a trained adaptive unsupervised clustering model, establishing K long-short term memory network models, respectively training a kth long-short term memory network model by using a clustered kth category data set, and obtaining K trained long-short term memory network models, wherein K belongs to [1, K ]; inputting data to be predicted into a trained self-adaptive unsupervised clustering model to obtain a category corresponding to the data to be predicted; the data to be predicted comprises time to be predicted, a day type corresponding to the time to be predicted, indoor temperature and outdoor temperature; and inputting the data to be predicted into the long-term and short-term memory network model corresponding to the category for prediction to obtain the predicted pedestrian volume. Thus, the set K value is not required to be considered, and the people flow prediction is accurate.

Drawings

FIG. 1 is a schematic flow chart diagram of a preferred embodiment of a people flow prediction method provided by the present invention;

fig. 2 is a schematic structural diagram of a preferred embodiment of the human flow prediction device provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an embodiment of the invention provides a people flow prediction method, including:

s1, acquiring training data; the training data is composed of a plurality of groups of data, and each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume.

In the embodiment of the present invention, the flow rate of people is a label corresponding to a sample, and the sample is time, a day type corresponding to the time, an indoor temperature, and an outdoor temperature.

Preferably, the day type is weekday, ordinary weekend, ordinary holiday or spring festival. The time is specifically the current time.

It should be noted that the number of sets constituting the training data is set according to actual requirements, and the present invention is not limited herein, and may be at least 1000 sets, for example. In addition, the ratio of the number of groups to the total number of groups for different types of days is set according to actual requirements, and the invention is not limited, for example, the number of groups for different types of days is at least 10% of the total number of groups.

For example, when the training data is composed of 1000 groups of data, the day type includes weekday, ordinary weekend, ordinary holiday and spring festival, the number of groups corresponding to weekday is at least 100, the number of groups corresponding to ordinary weekend is at least 100, the number of groups corresponding to ordinary holiday is at least 100, the number of groups corresponding to spring festival is at least 100, and the total number is equal to 1000.

And S2, performing cluster training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, and selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering algorithm to obtain a trained self-adaptive unsupervised clustering model.

The K-means clustering algorithm is also called as K-means clustering algorithm, belongs to unsupervised clustering algorithm, and is a clustering algorithm based on distance. The distance is used as an evaluation index of similarity, namely the closer the distance between two objects is, the greater the similarity of the two objects is. The algorithm considers that class clusters are composed of closely spaced objects, and therefore the resulting compact and independent clusters are the final target.

Specifically, the K-means clustering algorithm is an iterative solution clustering analysis algorithm, and the steps of the algorithm are that K objects are randomly selected as initial clustering centers, then the distance between each object and each seed clustering center is calculated, and each object is allocated to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.

S3, establishing K long and short term memory network models, and respectively training the kth long and short term memory network model L by using the clustered kth class data set_k，k∈[1,K]And obtaining K long-term and short-term memory network models.

In the embodiment of the present invention, it should be noted that K is K with the smallest DB index.

S4, inputting the data to be predicted into the trained self-adaptive unsupervised clustering model to obtain the corresponding category of the data to be predicted; the data to be predicted comprises time to be predicted, a day type corresponding to the time to be predicted, indoor temperature and outdoor temperature.

And S5, inputting the data to be predicted into the long-short term memory network model corresponding to the category to obtain the predicted pedestrian volume.

As an example of the embodiment of the present invention, if each set of data is in the form of Z ═ D, T', Time, scope, then the training data is in the form of [ Z ═ Z₁ Z₂…Z_n]Wherein D is a day type, T is an indoor temperature, T' is an outdoor temperature, Time is Time, and people is human traffic; then the unsupervised clustering algorithm is used to pair the training dataPerforming clustering training, calculating DB indexes under different K values, selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering algorithm, and obtaining a trained self-adaptive unsupervised clustering model, which specifically comprises the following steps:

s11: from the training data [ Z₁ Z₂…Z_n]In randomly selecting K points u_i，i∈[1,K]As a clustering center, K has an initial value of K ═ 1;

s12: finding the cluster center u_iAll points Z within radius h_m，Z_m∈{Z₁,Z₂...,Z_nAll Z_mPoints of (A) are denoted as set S_i；

S13: calculating each of the cluster centers u_iTo the set S_iOffset M of (a);

s14: clustering the center u_iMove by an offset M;

s, step 15: repeating the steps 12-14 until the cluster center u_iConvergence no longer moves; wherein, in the convergence process, the [ Z ] is₁ Z₂…Z_n]The n points are clustered into the center u according to different clusters_iFrequency of access, u with highest frequency of access_iI.e. the final cluster center to which it belongs, the central point u_iAll points Z belong to one class;

s16: calculating a DB index at the K value;

s17: updating the value K by K +1,

s18: circulating S11-S17 until the K value reaches a preset maximum value;

s19: and selecting the value K with the minimum BD index as the K value of the adaptive unsupervised clustering algorithm to obtain the trained adaptive unsupervised clustering model.

In the embodiment of the present invention, it is preferable that the K value does not exceed 15.

As an example of the embodiment of the present invention, the calculating of each clustering center u is described_iTo the set S_iSpecifically, the offset M of (a) includes:

according to the formula

Calculating each of the cluster centers u_iTo the set S_iIs offset by M, wherein p is S_iThe number of interior points.

As an example of the embodiment of the present invention, the calculating the DB index under the K value specifically includes:

according to the formula

As an example of the embodiment of the present invention, the training of the kth long-short term memory network model by using the clustered kth class data set specifically includes:

It should be noted that, the values of Dropout, batch _ size and epochs are set according to actual requirements, and the invention is not limited thereto. For example, Dropout may have a value of 0.5, batch _ size may be 100, epochs may have a value of 500.

In the embodiment of the present invention, it should be understood that the Mean Absolute Error (MAE) is a loss function for the regression model, and the MAE is the sum of absolute values of the difference between the target value and the predicted value, and is used to represent the difference degree between the predicted value and the actual data. Adam is an optimizer of a loss function in a ladder process, comprehensively considers first moment estimation of a gradient, namely mean value of the gradient and second moment estimation, namely un-centralized variance of the gradient, calculates an updating step length, and is very suitable for being applied to large-scale data and parameter scenes. Dropout of 0.5 means that randomly stopping the activation of a neuron with a probability of 0.5 as the neural network propagates forward can make the model more generalized since it is less dependent on some local features. A batch _ size of 100 indicates the number of samples selected for a training session, i.e., 100 sets of data are taken from 1000 sets of data (assuming the sample data consists of 1000 sets of data) for each training session to train the LSTM network. An epoch of 500 indicates a total number of training sessions.

Example 2:

referring to fig. 2, an embodiment of the present invention provides a people flow rate obtaining apparatus, including:

an acquisition unit 1 for acquiring training data; the training data consists of a plurality of groups of data, and each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume;

the K value calculating unit 2 is used for carrying out clustering training on the training data by adopting an unsupervised clustering algorithm, calculating DB indexes under different K values, and selecting the K value with the minimum DB index as the K value of the self-adaptive unsupervised clustering algorithm to obtain a trained self-adaptive unsupervised clustering model;

the long-short term memory network model training unit 3 is used for establishing K long-short term memory network models, training the kth long-short term memory network model by using the clustered kth class of data sets respectively, and obtaining K trained long-short term memory network models, wherein K belongs to [1, K ];

the category acquisition unit 4 is used for inputting data to be predicted into a trained self-adaptive unsupervised clustering model to acquire a category corresponding to the data to be predicted; the data to be predicted comprises time to be predicted, a day type corresponding to the time to be predicted, indoor temperature and outdoor temperature;

and the people flow predicting unit 5 is used for inputting the data to be predicted into the long-short term memory network model corresponding to the category for prediction to obtain the predicted people flow.

As an example of the embodiment of the present invention, if each set of data is in the form of Z ═ D, T', Time, scope, then the training data is in the form of [ Z ═ Z₁ Z₂…Z_n](ii) a Wherein D is a day type, T is an indoor temperature, T' is an outdoor temperature, Time is Time, and peoples are human traffic;

S13: calculating each of the cluster centers u_iTo the set S_iOffset M of (a);

s14: clustering the center u_iMove by an offset M;

s15: repeating the steps 12-14 until the cluster center u_iConvergence no longer moves; wherein, in the convergence process, the [ Z ] is₁ Z₂…Z_n]The n points are clustered into the center u according to different clusters_iFrequency of access, u with highest frequency of access_iI.e. the final cluster center to which it belongs, the central point u_iAll points Z belong to one class;

s16: calculating a DB index at the K value;

s17: updating the K value by K + 1;

s18: the step 11-17 is circulated until the K value reaches a preset maximum value;

according to the formula

according to the formula

Calculating BD index I at the K value_DB(ii) a Wherein, C_oAnd C_uIn which the samples o and u are represented into corresponding clustersThe distance of the heart; f_i,jRepresenting the Euclidean distance between the center of cluster i to cluster j, i ∈ [1, K ∈]，j∈[1,K]。

As an example of the embodiment of the present invention, the day type is weekday, ordinary weekend, ordinary holiday or spring festival.

It should be noted that, all or part of the flow in the method according to the above embodiments of the present invention may also be implemented by a computer program instructing related hardware, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above embodiments of the method may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be further noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A people flow prediction method is characterized by comprising the following steps:

acquiring training data; the training data consists of a plurality of groups of data, wherein each group of data comprises time, a day type corresponding to the time, indoor temperature, outdoor temperature and pedestrian volume;

2. The people flow prediction method according to claim 1, wherein if each set of data is in the form of Z ═ Z (D, T', Time, scope), then the training data is in the form of [ Z ═ Z₁ Z₂ … Z_n](ii) a Wherein D is a day type, T is an indoor temperature, T' is an outdoor temperature, Time is Time, and peoples are human traffic;

step 11: from the training data [ Z₁ Z₂ … Z_n]In randomly selecting K points u_i，i∈[1,K]As a clustering center, K has an initial value of K ═ 1;

step 14: clustering the center u_iMove by an offset M;

step 15: repeating the steps 12-14 until the cluster center u_iConvergence no longer moves; wherein, in the convergence process, the [ Z ] is₁ Z₂ … Z_n]The n points are clustered into the center u according to different clusters_iFrequency of access, u with highest frequency of access_iI.e. the final cluster center to which it belongs, the central point u_iAll points Z belong to one class;

step 16: calculating a DB index at the K value;

and step 17: updating the K value by K + 1;

step 18: the step 11 to the step 17 are circulated until the K value reaches a preset maximum value;

3. The method of predicting human traffic according to claim 2, wherein said calculating each of said clustering centers u_iTo the set S_iSpecifically, the offset M of (a) includes:

according to the formula

4. The people flow prediction method according to claim 2, wherein the calculating the DB index under the K value specifically includes:

according to the formula

5. The people flow prediction method according to claim 1, wherein the training of the kth long-short term memory network model by using the clustered kth class data set specifically comprises:

6. The people flow prediction method according to claim 1, wherein the day type is weekday, ordinary weekend, ordinary holiday, or spring festival.

7. A pedestrian flow rate acquisition apparatus, comprising:

8. The people flow prediction device of claim 7, wherein if each set of data is in the form of Z ═ Z (D, T, T', Time, scope), then the training data is in the form of [ Z ═ Z [₁ Z₂ … Z_n](ii) a Wherein D is a day type, T is an indoor temperature, T' is an outdoor temperature, Time is Time, and peoples are human traffic;

step 11: from the training data [ Z₁ Z₂ … Z_n]In randomly selecting K pointsu_i，i∈[1,K]As a clustering center, K has an initial value of K ═ 1;

step 14: clustering the center u_iMove by an offset M;

step 16: calculating a DB index at the K value;

and step 17: updating the K value by K + 1;

9. The people flow prediction device of claim 7, wherein the calculating each of the cluster centers u_iTo the set S_iSpecifically, the offset M of (a) includes:

according to the formula

10. The people flow rate prediction device according to claim 7, wherein the calculating the DB index at the K value specifically includes:

according to the formula