CN115907181A

CN115907181A - Urban rail transit passenger flow prediction method and system

Info

Publication number: CN115907181A
Application number: CN202211531806.4A
Authority: CN
Inventors: 王丹丹; 马卓; 张晓玲; 李蕾; 员珍珍; 冉苒
Original assignee: Zhengzhou Railway Vocational and Technical College
Current assignee: Zhengzhou Railway Vocational and Technical College
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2023-04-04
Anticipated expiration: 2042-12-01
Also published as: CN115907181B

Abstract

The invention relates to the technical field of urban traffic data processing, in particular to a method and a system for predicting passenger flow of urban rail transit, wherein the method collects the number of passengers at different time intervals to form passenger flow data; the method comprises the steps that historical passenger flow data are formed by passenger flow data corresponding to dates of which dates to be predicted belong to the same week value; acquiring historical similarity between a date to be predicted and historical passenger flow data; the method comprises the steps of obtaining historical similarity of dates adjacent to a date to be predicted, obtaining similar continuity according to the date to be predicted and the change of the historical similarity corresponding to the adjacent dates, obtaining error probability of each adjacent date when the similar continuity is larger than a preset threshold value, further obtaining a weight value, and obtaining passenger flow prediction data of the date to be predicted according to the weight value of the adjacent date and the corresponding passenger flow data. The invention can still obtain accurate prediction data under emergency, and improves the accuracy of passenger flow prediction.

Description

Urban rail transit passenger flow prediction method and system

Technical Field

The invention relates to the technical field of urban traffic data processing, in particular to a method and a system for predicting passenger flow of urban rail transit.

Background

The subway is an important carrier of urban rail transit in cities in China, has the advantages of large traffic volume, high speed, accurate time and the like, and makes important contribution to urban traffic. However, at the peak of the passenger flow, the subway still has higher pressure, so that the passenger flow in the subway station is crowded, and the personal safety of passengers and the safety of subway operation can be influenced to a certain extent, so that it is very important for subway operation to acquire the passenger flow in time as the basis for carrying out subway operation scheduling.

In the prior art, when passenger flow data based on subway history is predicted, historical data is subjected to regression fitting by acquiring the historical data, or characteristics of the historical data are learned through a network so as to fit the characteristics of the historical data to predict passenger flow, but when a subway actually runs, the passenger flow of the subway is easily influenced by some emergencies, so that fluctuation of passenger flow of the subway is large, such as holidays, weather influences and other factors, and further, when the passenger flow data is predicted according to the historical data, the accuracy of a prediction result is poor.

Disclosure of Invention

In order to solve the problem of poor passenger flow prediction accuracy in emergency situations, the invention provides a passenger flow prediction method and a passenger flow prediction system for urban rail transit, and the adopted technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for predicting passenger flow in urban rail transit, including the following steps:

collecting the number of passengers at different time intervals in the subway running time, wherein the sequence formed by the number of passengers at all time intervals is passenger flow data at corresponding dates;

selecting passenger flow data corresponding to a date which belongs to the same week value as the date to be predicted from passenger flow data of preset days before the date to be predicted to form historical passenger flow data of the date to be predicted; acquiring the similarity between every two pieces of passenger flow data in the historical passenger flow data, constructing a Gaussian model of all the similarities, and acquiring the historical similarity between the date to be predicted and the historical passenger flow data under the Gaussian model;

acquiring historical similarity of adjacent dates of the date to be predicted under the corresponding Gaussian model, acquiring similarity continuity according to change of the historical similarity corresponding to the date to be predicted and the adjacent dates, and acquiring error probability of each adjacent date according to the historical similarity of the date to be predicted and the adjacent dates and time difference when the similarity continuity is greater than a preset threshold value; the adjacent date is each date in preset days before the date to be predicted;

and obtaining a weight value of each adjacent date by performing negative correlation mapping on the error probability of each adjacent date, and obtaining passenger flow prediction data of the date to be predicted according to the weight value of the adjacent date and the corresponding passenger flow data.

Further, the process of obtaining the similarity between every two passenger flow volume data is as follows:

and calculating the DTW distance between every two pieces of passenger flow data as corresponding difference, and performing negative correlation mapping on the difference to obtain the similarity between the two pieces of passenger flow data.

Further, the obtaining of historical similarity between the date to be predicted and the historical passenger flow data under the gaussian model includes:

taking each date as a node, taking the difference between every two pieces of passenger flow data as an edge weight value between corresponding nodes, and constructing a completely undirected graph by using a shortest path algorithm; acquiring the actual passenger number of the date to be predicted in all time periods in the preset time length to form an actual sequence, calculating the difference between the subsequence of the passenger flow data corresponding to each node in the preset time length part and the actual sequence, and selecting the node with the maximum difference as a target node;

and calculating the similarity between the actual sequence of the date to be predicted and the subsequence corresponding to the target node, and substituting the similarity into the corresponding Gaussian model to obtain the historical similarity corresponding to the date to be predicted.

Further, the method for obtaining the similar continuity includes:

and subtracting the absolute value of the difference value of the historical similarity of the previous adjacent date from the historical similarity of each adjacent date to be used as a new historical similarity, acquiring the average value and the peak value of the new historical similarity of all adjacent dates, taking the adjacent date corresponding to the peak value closest to the date to be predicted as a target date, taking the difference between the new historical similarity of the target date and the average value as a first difference value, taking the absolute value of the difference value of the new historical similarity of the date to be predicted as a second difference value, and taking the product of the first difference value and the second difference value as the similarity continuity.

Further, the method for obtaining the error probability comprises the following steps:

and taking the corresponding probability of the time difference between the date to be predicted and the adjacent date in the standard normal distribution as an adjusting coefficient, taking the product of the adjusting coefficient and a preset adjusting coefficient as a final adjusting coefficient, taking the product of the historical similarity of the adjacent date and the final adjusting coefficient as the adjusting similarity of the corresponding adjacent date, and selecting a smaller value from the historical similarity of the date to be predicted and the adjusting similarity as the error probability of the corresponding adjacent date.

Further, the method for obtaining the weight value comprises:

and taking the opposite number of the error probability as an index of a natural constant to obtain an index function, and normalizing the index function value obtained by calculation to obtain a normalization result, namely the weight value corresponding to the adjacent date.

Further, the passenger flow prediction data is obtained by the following method:

and acquiring a preset initial weight value and a weighted average value of the weight values, and acquiring passenger flow prediction data by using a weighted moving average method according to the weighted average value and the passenger flow data of the corresponding adjacent date.

In a second aspect, another embodiment of the present invention provides an urban rail transit passenger flow prediction system, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor, when executing the computer program, implements the steps of the urban rail transit passenger flow prediction method.

The invention has at least the following beneficial effects:

firstly, counting the number of passengers at different time intervals on each date, and using the constructed sequence as daily passenger flow data to reflect the passenger flow data at different time intervals on each day; then, historical passenger flow data of the date to be predicted is selected, the historical passenger flow data are grouped according to the star values, the periodic characteristics of the historical passenger flow data are reflected, a Gaussian model is constructed according to the similarity between every two pieces of passenger flow data, the data of the same star value in the historical passenger flow data are quantized, the historical similarity between the date to be predicted and the historical passenger flow data under the corresponding Gaussian model is obtained, and the accuracy of the passenger flow data of the date to be predicted with the historical passenger flow data as a reference is judged; then obtaining historical similarity of dates adjacent to the date to be predicted under a corresponding Gaussian model, judging the accuracy of taking the historical passenger flow data as reference of the passenger flow data of the dates adjacent to the date to be predicted, obtaining similar continuity according to the change of the historical similarity corresponding to the date to be predicted and the dates adjacent to the date to be predicted, if the similar continuity is larger than a preset threshold value, indicating that the date to be predicted is inaccurate when taking the historical passenger flow data as reference, and possibly causing an emergency, taking the passenger flow data of the dates adjacent to the date as a prediction basis at the moment, obtaining error probability of each date adjacent to the date according to the historical similarity and time difference of the date to be predicted and the dates adjacent to the moment, and evaluating the error probability of each date adjacent to the moment when taking the passenger flow data of the dates adjacent to the prediction basis so as to distribute weight according to the size of the error probability, wherein the larger error probability, the smaller reference value of the dates adjacent to the smaller corresponding weight value; and finally, obtaining passenger flow prediction data of the date to be predicted according to the weight values of the adjacent dates and the corresponding passenger flow data, so that accurate prediction data can be still obtained under emergency conditions, and the accuracy of passenger flow prediction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating steps of a method for predicting passenger flow in urban rail transit according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given to a method and a system for predicting passenger flow in urban rail transit according to the present invention, and the specific implementation, structure, features and effects thereof with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the urban rail transit passenger flow prediction method and system provided by the invention in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of steps of a method for predicting passenger flow in urban rail transit according to an embodiment of the present invention is shown, where the method includes the following steps:

and S001, collecting the number of passengers in different time periods in the subway running time, wherein the sequence formed by the number of passengers in all time periods is passenger flow data in corresponding dates.

For any subway station in a city, the number of passengers is collected through a gate system in the embodiment of the invention. And acquiring card swiping records of all gates in the station through a subway gate system. Wherein each time the card is swiped, it means that there is a flow of passenger traffic at the current station. Regardless of the entrance and exit of the gate, as long as more cards are swiped at the gate, the more passengers at the current subway station are, and the current passenger flow is larger. The number of passengers in the subway station in different time periods is obtained through the total card swiping record of the gate machines in the same subway station, in the embodiment of the invention, the total card swiping record of all the gate machines in the subway station is counted once every 10 minutes as a time period, namely, the number of passengers in the subway station is obtained once every 10 minutes. The duration of the time period can be adjusted according to actual requirements.

In the operation time of the subway station, the number of passengers is collected once every time period, and all the numbers of passengers in the operation time of the subway station form a sequence and are recorded as passenger flow data at the corresponding date. Namely, the subway station starts to operate to finish operation every day, and passenger flow data of the day is obtained.

S002, selecting passenger flow data corresponding to the date of the same week value with the date to be predicted from the passenger flow data of the preset days before the date to be predicted to form historical passenger flow data of the date to be predicted; the method comprises the steps of obtaining the similarity between every two passenger flow volume data in the history passenger flow volume data, constructing a Gaussian model of all the similarities, and obtaining the historical similarity between the date to be predicted and the historical passenger flow volume data under the Gaussian model.

Because the passenger flow data during the operation of the subway is influenced by the working days and the rest days, the passenger flow data under the same week value is relatively similar, for example, the passenger flow data corresponding to two dates belonging to friday are relatively similar, the passenger flow data corresponding to two dates belonging to sunday are relatively similar, but the passenger flow data corresponding to the date belonging to friday and the passenger flow data corresponding to the date belonging to sunday have a relatively large difference. Therefore, the passenger flow data corresponding to the date which is before the date to be predicted and belongs to the same week value are compared to be referential.

In the embodiment of the present invention, the preset number of days is 100 days, for example, the to-be-predicted date is 2022 year 11 month 26 day, and is saturday, then the passenger flow data corresponding to all saturday dates within 100 days before 2022 year 11 month 26 day is selected, so as to form the history passenger flow data of 2022 year 11 month 26 day.

It should be noted that, in other embodiments, the selection of the preset number of days may be adjusted according to actual conditions, the preset number of days is not too small, too little data may cause insufficient reference, and the prediction result is inaccurate; similarly, the data of passenger flow is not too large, the data of passenger flow has certain seasonality, and too much data can mix too much useless data, so that the prediction result is not accurate.

And calculating the DTW distance between every two pieces of passenger flow data in all the selected calendar passenger flow data as corresponding difference, and performing negative correlation mapping on the difference to obtain the similarity between the two pieces of passenger flow data.

The DTW distance is a distance between two time series data obtained by a dynamic time warping algorithm (DTW algorithm), the DTW algorithm is a well-known algorithm in the field of data processing, and the detailed process is not described in this embodiment. The larger the DTW distance, the larger the difference between the corresponding two sequences, so the similarity between the corresponding two passenger flow data is obtained by mapping the negative correlation of the DTW distance between every two passenger flow data. In an embodiment of the invention the negative correlation mapping is implemented by an exp (-x) function, which represents an exponential function with a natural constant e as base and an exponent with-x as exponent. And recording the DTW distance between every two pieces of passenger flow data as J, and then the corresponding similarity is exp (-J).

Taking each date as a node, taking the difference between every two pieces of passenger flow data as an edge weight value between corresponding nodes, and constructing a completely undirected graph by using a shortest path algorithm; the actual passenger number of the date to be predicted in all time periods in the preset time length is obtained to form an actual sequence, the difference between the subsequence of the passenger flow data corresponding to each node in the preset time length part and the actual sequence is calculated, and the node with the largest difference is selected as a target node.

Because the historical data under the same week value are provided with a plurality of pieces and have no reference standard, when the historical similarity under the same week value is calculated, the date corresponding to each piece of passenger flow data under the same week value is taken as a node, and a complete undirected graph is established. And the edge weight value between any two nodes in the completely undirected graph is the DTW distance value J between the passenger flow data corresponding to the two nodes. Corresponding edge weights exist between any two nodes in the completely undirected graph, the data volume is too large, and the similarity between data cannot be visually reflected, so that a path with the maximum total DTW distance is obtained through a shortest path algorithm, each node in the path is only connected with another node, and the sum of all the edge weights is maximum. At this time, the number of all nodes is recorded as n, and n-1 edge weights are total.

Although there is a difference between the passenger flow data, it should be approximate in general, because the passenger flow data belonging to the same week value has a higher approximation of the travel habit, and belongs to the same working day or holiday, even if there is a difference, the approximation of the whole passenger flow data is higher.

The week value of the date to be predicted is recorded as the w-th week value, and the similarity { exp (-J) corresponding to the n-1 edge weight values is obtained ₁ ),exp(-J ₂ ),…,exp(-J _n-1 ) And calculating the mean value Z of all the similarity _w Sum variance F _w To obtain a single Gaussian model G corresponding to the w week value _w In the single Gaussian model corresponding to the w-th week value, if the passenger flow data acquired by the date to be predicted and the historical passenger flow corresponding to all the nodes in the completely undirected graphThe closer the negative correlation map exp (-J ') of the maximum distance value J' between data is to Z _w It indicates that the difference between the current new passenger flow data compared to the historical passenger flow data is between acceptable ranges. If exp (-J') is closer to Z _w Then G is _w The larger the value of (exp (-J').

To better represent the approximation between new passenger flow data and historical passenger flow data, a single Gaussian model G is applied _w Adjusting the coefficient so that G _w Is 1. G can be obtained by multiplying a single Gaussian model by the inverse of the maximum value or by normalizing the single Gaussian model _w Is 1.

And selecting the node with the maximum difference as a target node, namely recording the node corresponding to the maximum distance value J' as the target node, calculating the similarity between the actual sequence of the date to be predicted and the subsequence corresponding to the target node, and substituting the similarity into the corresponding Gaussian model to obtain the historical similarity corresponding to the date to be predicted.

The actual sequence is a sequence consisting of a plurality of actual passenger numbers acquired within a preset time length on the day of the date to be predicted, in the embodiment of the invention, the time length from 0 point to 6 points is taken as the preset time length, namely the sequence consisting of the passenger numbers acquired in all time periods from 0 point to 6 points is taken as the actual sequence. If the subway station needing to be predicted does not operate all day long, a certain time length before the operation of the subway station on the day on the date to be predicted is taken as a preset time length, for example, the first two hours of the operation is taken as the preset time length, and the corresponding number of all passengers forms an actual sequence. That is, the passenger flow prediction in the present embodiment is to predict the passenger flow data 6 am later or to predict the passenger flow data two hours after operation.

Because the DTW algorithm is used for calculating the distance value after aligning the starting point and the end point during calculation, when the approximation degree of the actual sequence and the passenger flow data of the target node is calculated, the data in the same time interval is selected for calculation, namely, a part of subsequence of the passenger flow data of the target node in the preset time length is intercepted for calculation. If the time periods are different, the results obtained using the DTW algorithm will have a large difference.

Step S003, acquiring historical similarity of the adjacent date of the date to be predicted under the corresponding Gaussian model, acquiring similarity continuity according to change of the historical similarity corresponding to the date to be predicted and the adjacent date, and acquiring error probability of each adjacent date according to the historical similarity of the date to be predicted and the adjacent date and time difference when the similarity continuity is greater than a preset threshold value; the adjacent date is each date within a preset number of days before the date to be predicted.

In the embodiment of the present invention, if the preset number of days is 100 days, the adjacent date is each day within 100 days before the date to be predicted, and the historical similarity of each adjacent date is obtained according to the same method for obtaining the historical similarity of the date to be detected in step S002.

If the historical similarity of the date to be predicted and the historical passenger flow data is low, it is indicated that the periodicity of the date to be predicted is reduced due to the occurrence of an emergency factor, and at this time, if the historical similarity of the adjacent date is also low and has continuity with the date to be predicted, it is indicated that the emergency situation is continuously occurring, for example, due to holidays, continuous weather changes and other factors, and at this time, when the passenger flow data of the date to be predicted is predicted, the passenger flow data of the adjacent days should be used as a basis for being more credible.

And subtracting the absolute value of the difference value of the historical similarity of the previous adjacent date from the historical similarity of each adjacent date to be used as a new historical similarity, acquiring the average value and the peak value of the new historical similarity of all adjacent dates, taking the adjacent date corresponding to the peak value closest to the date to be predicted as a target date, taking the difference between the new historical similarity of the target date and the average value as a first difference value, taking the absolute value of the difference value of the new historical similarity of the date to be predicted as a second difference value, and taking the product of the first difference value and the second difference value as similarity continuity.

If the historical similarity of the adjacent dates and the historical similarity of the dates to be predicted are changed regularly, the fact that the adjacent dates and the dates to be predicted have continuity is shown, the continuity can be linear continuity or Gaussian continuity, therefore, the historical similarity of the previous adjacent dates is subtracted from the historical similarity of the next adjacent dates, the obtained difference absolute value is used as new historical similarity, and if the original two historical similarities are closer, the new historical similarity obtained after subtraction is smaller and even close to 0; if the original two historical similarities are different greatly, a peak point will appear. If two peak points are close and have a large difference with the non-peak point, continuity exists between the two peak points.

Since the continuity judgment is to judge whether the historical similarity of the adjacent date has continuity with the historical similarity of the date to be predicted, the new historical similarity of the date to be predicted is regarded as a peak point at first, and if the peak point is really the peak point, the data is just mutated, that is, the historical similarity of the adjacent date does not have continuity with the historical similarity of the date to be predicted.

Recording the date to be predicted as the ith day and recording the corresponding new historical similarity as S _i ' extracting all peak values in all new historical similarity by a peak value extraction method, selecting the day closest to the ith day from each peak value point to be recorded as the jth day, and recording the corresponding new historical similarity as S _j ′。

Since day j is one of the first 100 days of day i, i.e., j < i, S _j Must be at S _i Before, and day j is the day on which there was a change closest to day i.

If S is _j ' is the larger peak, S _i ' and S _j The smaller the difference, and S _i ' and S _j ' comparing to the overall difference value, it means that the change at the j day is similar to the change at the i day, and the change at the j day and the change at the i day may be corresponding to two phases, for example, one rising and the other falling, which indicates that the current i day may be a continuous end.

If S is _j ' is the larger peak, S _i ' and S _j ' the larger the difference, the more S _i ' having a smaller value, two changes do not return the new historical similarity to a normal level, indicating a possible current subway burstThe situation continues and the passenger flow data for days between i and j are more referential.

Wherein if S is _j ' is the smaller peak, and S _i Also' small indicates that there may be no emergency.

If S is _j ' is the smaller peak, and S _i 'greater', it may be the very beginning of an emergency, i.e., the previous data of the adjacent date may not be used as a reference as well.

I.e. only when S _j ' is a larger peak, and S _i ' and S _j When the difference is large, the data between i and j represents valid continuous data, and the data is discontinuous in other cases from the viewpoint of continuity.

Thus, an average S' of all new historical similarities is obtained, and then a first difference S is calculated _j ′-S′，S _j The larger the value of'-S', the larger the value of S is _j The larger the ` relative to the whole, i.e. S _j ' is the larger peak. Calculating a second difference | S _i ^′ -S _j ^′ |，|S _i ^′ -S _j ^′ The larger the value of |, the more S is indicated _i ' and S _j The greater the' difference value is, the greater the possibility that the subway emergency is still persistent at the moment, and the combination of the two results in similar continuity P = (S) _j ^′ -S ^′ )×|S _i ^′ -S _j ^′ |，S _j The larger the value of'-S' | S _i ^′ -S _j ^′ The larger the value of |, the more S is indicated _j ' is a larger peak and is associated with S _i ^′ The larger the difference, the more likely it is that day i is in the process of an emergency, the larger the corresponding similar continuity P.

When the similarity continuity is larger than the preset threshold value, the ith day is in the continuous process of the emergency, and the error probability of the adjacent date needs to be calculated.

As an example, the preset threshold value is 0.2 in the embodiment of the present invention.

And taking the corresponding probability of the time difference between the date to be predicted and the adjacent date in the standard normal distribution as an adjusting coefficient, taking the product of the adjusting coefficient and a preset adjusting coefficient as a final adjusting coefficient, taking the product of the historical similarity of the adjacent date and the final adjusting coefficient as the adjusting similarity of the corresponding adjacent date, and selecting the smaller value from the historical similarity and the adjusting similarity of the date to be predicted as the error probability of the corresponding adjacent date.

The adjustment coefficient is u (i-k), where u () represents a standard normal distribution, u (i-k) represents the corresponding ordinate in the standard normal distribution with i-k as the abscissa, i represents the day number label of the date to be predicted, and k represents the day number label of the adjacent date. The mean value of the standard normal distribution is 0, the standard deviation is 1, namely the ordinate when the abscissa is 0 is the maximum value, i-k >0 is taken as the right half part of the standard normal distribution, and the smaller the value of i-k, the closer the kth day as the adjacent date and the date to be predicted is, the larger the corresponding ordinate is, namely the larger the adjustment coefficient is.

Taking the product of the adjustment coefficient u (i-k) and a preset adjustment coefficient h as a final adjustment coefficient hXu (i-k), and taking the product of the historical similarity of adjacent dates and the final adjustment coefficient hXu (i-k) XS _k As adjusted similarity corresponding to adjacent dates, where S _k Indicating the historical similarity of day k as the adjacent date. As an example, in the embodiment of the present invention, the preset adjustment coefficient h is 3, and the larger the adjustment coefficient is, the larger the corresponding final adjustment coefficient is, and the more likely it is that the final adjustment coefficient is greater than 1, that is, the historical similarity S of adjacent dates is about to be greater than 1 _k The numerical value of (2) is enlarged; conversely, the smaller the adjustment coefficient, the smaller the corresponding final adjustment coefficient, and the more likely to be less than 1, i.e., the historical similarity S of the adjacent dates _k The numerical value of (c) is reduced.

Selecting a smaller value from the historical similarity and the adjustment similarity of the date to be predicted as the error probability of the corresponding adjacent date:

W _k ＝min(S _i ,h×u(i-k)×S _k )

wherein, W _k Indicating the probability of error, S, for the k-th day as a neighboring date _i Representing the historical similarity of day i as the date to be predicted, hXu (i-k). Times.S _k Denotes the firstAdjustment similarity for k days, min () represents the minimum function.

S _i The similarity value corresponding to the generated passenger flow data on the ith day and the history passenger flow data indicates that the similarity between the generated passenger flow data on the ith day and the history passenger flow data is higher when the value of the similarity value is larger, and the reference of the history passenger flow data belonging to the same week value is stronger when the prediction is carried out by utilizing the history passenger flow data belonging to the same week value; if the value is smaller, the similarity between the generated passenger flow data on the ith day and the history passenger flow data is lower, when the history passenger flow data belonging to the same week value is used for prediction, the reference of the history passenger flow data belonging to the same week value is weaker, and at this time, in order to ensure the accuracy of the passenger flow data prediction, more passenger flow data of the adjacent date are needed. Thus S _i The larger the numeralization of (a), the lower the error probability, the smaller the number, the larger the error probability.

W _k The larger the value of the error probability corresponding to the passenger flow data of the kth day is, the larger the error probability possibly generated when the passenger flow data of the kth day is used for passenger flow data prediction of the ith day is, and the more possibility that the passenger flow data prediction of the ith day is inaccurate is caused; conversely, the smaller this value, the smaller the error that may be generated when the passenger flow data on the k-th day is used for the prediction of the passenger flow data on the i-th day, so that the higher the prediction accuracy of the passenger flow data on the i-th day is.

The smaller value is selected from the historical similarity and the adjustment similarity of the date to be predicted, the smaller value is the smaller error probability, and therefore the smaller value is selected to represent the error probability of the k day.

And step S004, obtaining a weight value of each adjacent date by performing negative correlation mapping on the error probability of each adjacent date, and obtaining passenger flow prediction data of the date to be predicted according to the weight value of the adjacent date and the corresponding passenger flow data.

After the error probability of each adjacent date is obtained, the data with the larger error probability is given smaller weight, and the data with the smaller error probability is given larger weight, so as to obtain the passenger flow prediction data of the date to be predicted. The error probability and the weight are therefore inversely related.

And taking the opposite number of the error probability as an index of a natural constant to obtain an index function, and normalizing the index function value obtained by calculation to obtain a normalization result, namely the weight value corresponding to the adjacent date. Firstly, obtaining a negative correlation mapping result of each error probability: exp (-W) _k ) Where exp () represents an exponential function with the base natural constant e. Then normalizing all negative correlation mapping results to obtain a normalization result, namely a weight value corresponding to the adjacent date, and recording the weight value of the k day as Z _ck 。

If the weight value in passenger flow prediction is larger, the higher the accuracy rate is, and the smaller the error is, the passenger flow data prediction of the remaining time of the ith day is performed by the passenger flow data corresponding to the adjacent date.

The periodic data prediction is carried out by a weighted moving average method, when the data prediction is carried out by the weighted moving average method, the period needs to be divided according to the set period length, in the embodiment of the invention, the period is divided according to the difference of working days and rest days, namely, the subway passenger flow data period is set to be one period of 7 days, and then the weight value for carrying out the data prediction by the existing weighted moving average method is obtained by presetting an initial weight value, because in the weighted moving average method, the sum of all weights is required to be 1, the preset mode of the initial weight value is as follows: the number of all data points in one period is obtained and then divided by 1 as the initial weight value.

Although the weight of each data in the weighted moving average method is obtained by presetting the initial weight value, the weighted moving average method only takes a defined time period as an interval when carrying out weight value distribution, but in the influence degree of time, if the weight value presetting methods are different, the influence of the weight value on the data is different, so that the accuracy in final data prediction is different; moreover, the weighted moving average method is based on more local data, and cannot acquire the influence of data continuity on the current passenger flow data prediction. Therefore, the invention performs weighted fusion on the weighted value and the initial weighted value preset by the weighted moving average method to obtain a new weighted value.

In the invention, the day is used as the minimum unit when the weight values are calculated in the steps, so that the same weight value is distributed to each time period of 10min in each day when the weight fusion is carried out, for example, the weight value of each time period in the kth day is Z _ck 。

Wherein the initial weight value preset by the weighted moving average method for the passenger flow data of the Z-th time period in the k day is Z _bk If the new weighted value of the weighted fusion corresponding to the z-th time interval of the k-th day is equal to

Wherein f is a hyper-parameter between 0 and 1, and is used for weighting and assigning values to the weight values, as an example, in the embodiment of the present invention, f =0.3 is taken, and in other embodiments, the adjustment may be performed according to a specific implementation scenario.

Wherein Z is obtained because the weighted moving average method requires the sum of the data weight values in the period to be 1 _k The ratio of the sum of the ownership weighted values in the corresponding period is used as the final weighted value Z _k ', all Z's in one cycle _k The sum of' is still 1.

And after all new weight values are obtained, predicting passenger flow data in the remaining time period of the ith day by using a weighted moving average method, and finishing obtaining the passenger flow prediction data of the date to be predicted.

It should be noted that the weighted moving average method is a method of respectively giving different weights to observed values, obtaining a moving average value according to the different weights, and determining a predicted value based on the final moving average value, and is a known technique, and a specific process of predicting passenger flow prediction data by using the weighted moving average method is not repeated in this embodiment.

In summary, in the embodiment of the present invention, the number of passengers at different time intervals is collected during the operation time of the subway, and the sequence formed by the number of passengers at all time intervals is the passenger flow data at the corresponding date; selecting passenger flow data corresponding to a date of which the date to be predicted belongs to the same week value from passenger flow data of preset days before the date to be predicted to form historical passenger flow data of the date to be predicted; acquiring the similarity between every two pieces of passenger flow data in the historical passenger flow data, constructing a Gaussian model of all the similarities, and acquiring the historical similarity between the date to be predicted and the historical passenger flow data under the Gaussian model; acquiring historical similarity of adjacent dates of the date to be predicted under the corresponding Gaussian model, acquiring similarity continuity according to change of the historical similarity corresponding to the date to be predicted and the adjacent dates, and acquiring error probability of each adjacent date according to the historical similarity of the date to be predicted and the adjacent dates and time difference when the similarity continuity is greater than a preset threshold value; the adjacent date is each date within preset days before the date to be predicted; and obtaining a weight value of each adjacent date by carrying out negative correlation mapping on the error probability of each adjacent date, and obtaining passenger flow prediction data of the date to be predicted according to the weight value of the adjacent date and the corresponding passenger flow data. According to the embodiment of the invention, accurate prediction data can be still obtained under an emergency condition, and the accuracy of passenger flow prediction is improved.

The embodiment of the invention also provides an urban rail transit passenger flow prediction system, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the steps when executing the computer program. Since the detailed description is given above for the urban rail transit passenger flow prediction method, no further description is given.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And that specific embodiments have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts in the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; modifications of the technical solutions described in the foregoing embodiments, or equivalents of some technical features thereof, are not essential to the spirit of the technical solutions of the embodiments of the present application, and are all included in the scope of the present application.

Claims

1. A passenger flow prediction method for urban rail transit is characterized by comprising the following steps:

collecting the number of passengers at different time intervals in the subway running time, wherein the sequence formed by the number of the passengers at all the time intervals is passenger flow data at corresponding dates;

acquiring historical similarity of dates adjacent to the date to be predicted under the corresponding Gaussian model, acquiring similar continuity according to change of the historical similarity corresponding to the date to be predicted and the adjacent dates, and acquiring error probability of each adjacent date according to the historical similarity of the date to be predicted and the adjacent dates and time difference when the similar continuity is larger than a preset threshold value; the adjacent date is each date in preset days before the date to be predicted;

2. The method for predicting the passenger flow of the urban rail transit according to claim 1, wherein the process of obtaining the similarity between every two passenger flow data is as follows:

3. The urban rail transit passenger flow prediction method according to claim 2, wherein the obtaining of historical similarity between a date to be predicted and historical passenger flow data under a Gaussian model comprises:

taking each date as a node, taking the difference between every two pieces of passenger flow data as an edge weight value between corresponding nodes, and constructing a completely undirected graph by using a shortest path algorithm; acquiring the actual passenger number of the date to be predicted in all time periods in a preset time length to form an actual sequence, calculating the difference between a subsequence of passenger flow data corresponding to each node in the preset time length part and the actual sequence, and selecting the node with the maximum difference as a target node;

4. The urban rail transit passenger flow prediction method according to claim 1, characterized in that the similar continuity acquisition method is:

and subtracting the absolute value of the difference value of the historical similarity of the previous adjacent date from the historical similarity of each adjacent date to obtain a new historical similarity, taking the average value and the peak value of the new historical similarity of all adjacent dates as a target date, taking the adjacent date corresponding to the peak value closest to the date to be predicted as a first difference value, taking the difference between the new historical similarity of the target date and the average value as a second difference value, taking the absolute value of the difference value of the new historical similarity of the date to be predicted as a second difference value, and taking the product of the first difference value and the second difference value as the similarity continuity.

5. The urban rail transit passenger flow prediction method according to claim 1, characterized in that the error probability obtaining method is:

6. The method for predicting the passenger flow of the urban rail transit according to claim 1, wherein the method for obtaining the weight value comprises the following steps:

7. The urban rail transit passenger flow prediction method according to claim 1, wherein the passenger flow prediction data is obtained by:

8. An urban rail transit passenger flow prediction system comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor when executing the computer program implements the steps of the urban rail transit passenger flow prediction method according to any one of claims 1 to 7.