CN114331234B

CN114331234B - Rail transit passenger flow prediction method and system based on passenger travel information

Info

Publication number: CN114331234B
Application number: CN202210254509.3A
Authority: CN
Inventors: 许心越; 张安忠; 蔡昌俊; 刘军
Original assignee: Beijing Jiaotong University; Guangzhou Metro Group Co Ltd
Current assignee: Beijing Jiaotong University; Guangzhou Metro Group Co Ltd
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-07-12
Anticipated expiration: 2042-03-16
Also published as: CN114331234A

Abstract

The invention relates to the technical field of rail transit passenger flow prediction, in particular to a rail transit passenger flow prediction method and a rail transit passenger flow prediction system based on passenger travel information, wherein the method comprises the following steps: passenger travel data are acquired, a passenger travel information index system is established based on the passenger travel data, and each index information in the passenger travel information index system is calculated in a statistical mode; estimating the return passenger flow of different time periods in the station based on partial index information obtained by calculation in the passenger travel information index system; and taking the return passenger flow volume of passengers in the station as a covariate to be added into the passenger flow prediction model to predict the station entering passenger flow. According to the invention, the travel rule of the passenger is mined according to the multi-source travel data of the passenger, and a passenger travel information three-level index information system is established on the basis, so that the statistical calculation of each index information is realized. Meanwhile, a prediction method for identifying return passenger flow facing passenger travel information and effectively improving the accuracy of station arrival passenger flow according to the return passenger flow is provided.

Description

Rail transit passenger flow prediction method and system based on passenger travel information

Technical Field

The invention relates to the technical field of rail transit passenger flow prediction, in particular to a rail transit passenger flow prediction method and a rail transit passenger flow prediction system based on passenger travel information.

Background

The accurate prediction of passenger flow demands at stations is crucial to the operation of urban subway systems. In the past, the passenger flow values at several past times are mainly regarded as time series to predict the passenger flow at a certain future time. However, this method basically ignores the travel behavior law of the individual passenger. For example, if a passenger gets off to work at a subway station in the morning, he/she is likely to get on to get home at the same station in the evening. The existing research shows that the travel behavior component is very necessary to be added into the passenger flow prediction time sequence. According to the concept of user travel information, user labels which are easy to understand, representative and meaningful are abstracted through modeling, and an information set of a user is constructed through the labels to describe the behavior characteristics of the user. Therefore, based on the travel information of the individual passenger, the travel information of the passenger is constructed to describe the travel behavior rule of the individual passenger, and accurate passenger flow prediction is possible. At present, the following defects still exist in the travel information of rail transit passengers: deep mining is not carried out on the multisource travel information of the passengers, so that large data waste is caused; the index system of the passenger travel information established through data analysis is not sound enough, and still needs to be further mined.

Disclosure of Invention

The invention aims to solve at least one technical problem in the background art, and provides a method and a system for predicting rail transit passenger flow based on passenger travel information.

In order to achieve the purpose, the invention provides a rail transit passenger flow prediction method based on passenger travel information, which comprises the following steps:

passenger travel data are acquired, a passenger travel information index system is established based on the passenger travel data, and all indexes in the passenger travel information index system are calculated in a statistical mode;

estimating the return passenger flow of different time periods in the station based on partial index data obtained by calculation in the passenger travel information index system;

and taking the return passenger flow volume of passengers in the station as a covariate to be added into the passenger flow prediction model to predict the station entering passenger flow.

Preferably, the trip data includes:

AFC card swiping record and APP code scanning record for acquiring the passenger entrance and exit time and entrance and exit station information;

APP registration data used for obtaining identity information and associated information of passengers;

APP value-added consumption data used for obtaining value-added service information of passengers;

POI data near the stop, associated with the stop, for describing geographic attributes of the stop.

Preferably, the index information in the passenger travel information index system includes: basic information, business information and derived information;

the basic information comprises identity information and associated information, the identity information comprises an APPID, gender and age of a passenger and whether the passenger is disabled, and the associated information comprises a third-party payment mode of the passenger and a city all-purpose card;

the service information comprises trip basic information, trip derived information and value added service information, wherein the trip basic information comprises the trip in and out time and the trip in and out station information of passengers, the trip derived information comprises average trip duration, total trip times, daily average trip times, trip time distribution, trip OD distribution, trip path distribution, first trip time, last trip time, holiday trip time distribution and holiday trip OD distribution, and the value added service information comprises value added service participation times, participation frequency, average transaction amount, payment mode distribution, merchant type distribution and last participation time;

the derived information comprises an activity attribute and a function attribute, wherein the activity attribute comprises travel activity, and the function attribute comprises a travel demand type of the passenger, a residence area site, a working area site and a value-added participation degree.

Preferably, the formula for calculating the average trip duration is as follows:

the formula for calculating the average daily trip times is as follows:

the formula for calculating the travel time distribution is as follows:

counting the travel OD distribution as travel OD statistics of the first three travel frequencies of the passengers;

the formula for calculating the first trip time is as follows:

the formula for calculating the last travel time is as follows:

the formula for calculating the travel time distribution of the holidays is as follows:

counting the travel OD distribution of the holidays, namely the travel OD statistics of the passengers three times before the holiday travel frequency;

in the following formulas, the first and second groups,

represents the first

The next trip, d for the outbound site, o for the inbound site, i for the passenger, t for the time,

represents passenger i at time t

The time of the next outbound is the time of the next outbound,

represents passenger i at time t

The time of the next arrival is the time of the next arrival,

represents the average travel time period of the passenger i,

representing the total number of historical trips of passenger i,

representing the average number of trips of the passenger i, D representing the total number of days of the passenger within the statistical date,

is a binary identification function, when the condition is satisfied, the value is 1, otherwise 0,

representing the time of the first trip of passenger i,

representing the last travel time of the passenger i,

and counting the total times of travel of the passengers on the holidays within the counting date.

Preferably, the formula for calculating the participation frequency is:

wherein,

the frequency of participation in the value added service on behalf of the passenger i,

number of times passenger i participates in value added service;

counting the merchant type distribution as merchant type statistics of the first three passenger participation frequencies;

the formula for calculating the average transaction amount is:

wherein,

the average amount of money spent for passenger i to participate in the value added service,

a total amount of money spent participating in the value added service for the passenger i;

and counting the payment mode distribution as the statistics of the first three using modes when the passenger pays.

Preferably, the travel demand type is determined by a clustering result of a total travel frequency, a first travel time and an average travel time counted by passenger station-entering card-swiping data, and the clustering method comprises the following steps:

dividing the passengers into different categories according to the travel characteristics of the passengers by adopting a K-means algorithm, and selecting the total historical travel times of the passengers i in a passenger travel information index system

First trip time

And average length of trip

As an index of passenger clustering in a station, determining a clustering number K value by adopting an elbow method, wherein a key index of the elbow method is the sum of squared errors among clusters SSE, and a calculation formula is as follows:

wherein,

which represents the k-th cluster of the cluster,

is that

A center point of (a);

the formula for calculating the residential area site is:

，

；

the formula for calculating the work area station is as follows:

，

；

wherein,

the probability of representing the station e as the station of the residential area of the passenger i;

representing the probability that the station e is used as a station of a working area of the passenger i;

representing the total times of passengers to get in and out of the station e;

representing the number of station entrance times of a passenger i at a station e before 12 o' clock in working day;

representing the number of station entrance times of a passenger i at a station e before 16 resting days;

representing the number of station entering times of a passenger at a station e after 12 o' clock of a working day;

representing the number of times of entering the station e after 16 resting days of the passengers i;

representing the total times of passengers entering and leaving the station e on the working day of the passenger i;

representing the number of the outbound times of the passenger i at the station e before 12 o' clock on the working day;

representing the number of times of departure of a passenger at a station e after 12 o' clock of a working day;

representing the number of times of departure of a passenger at a station e after 16 resting days;

representing the number of times of departure of a passenger i at a station e 16 points before the rest day;

the value-added participation degree is set to be strong, medium and low, when the participation frequency is more than 0.7, the value-added participation degree of the passenger is strong, when the participation frequency is less than 0.4, the value-added participation degree of the passenger is low, and when the participation frequency is between 0.4 and 0.7, the value-added participation degree of the passenger is medium.

Preferably, the estimating of the return passenger flow volume at different time intervals in the station based on the passenger travel information index system and the calculated partial index data includes:

counting according to the in-and-out time, the residential area site, the working area site and the travel demand type in the passenger travel information index system

Wherein s is a certain site, v is a certain week, is 1-7, t is a certain time period,

the number of people who return from the s station within the time period t of v weeks;

selecting historical outbound and return passenger flow data in the station s, and obtaining the week in a mean value calculation modevThe passengers are at

Arrival of time periodsStand and are arranged at

Time period fromsConditional probability distribution of station departure and return journey

The calculation formula is as follows:

in the formula

Represents the total number of the weeks,

which represents the time period a of the time,

which represents the time period of b,

is shown asjOn the v-th day of the week

In a time period ofsThe number of passengers getting off the vehicle when standing,

which represents the time of arrival of the station,

represents the outbound time;

by said probability distribution

Estimating

The calculation formula is as follows:

wherein,

indicating day v of a week

The number of passengers getting off the bus at s station at the moment, H represents the maximum interval of the time for passengers getting on or off the bus at s station, the maximum interval is 24 hours, H represents time slot resolution, and t +1 represents the next time interval of the time interval t.

Preferably, the return passenger flow volume of passengers in the station is added to the passenger flow prediction model as a covariate, and the prediction of the station entering passenger flow is as follows:

will estimate the

Adding the predicted passenger flow into a common seasonal autoregressive moving average model to predict the station entering passenger flow;

the seasonal autoregressive moving average model is as follows: ARIMA (P, D, Q) [ Ω ], where P, D, Q represent the order of autoregressive, differencing and moving average, respectively; p, D, Q is the auto-regressive, differential and moving average order of seasonal portions; Ω is the number of cycles per season;

for a time series

，ARIMA(p,d,q)(P,D,Q)[Ω]The model is as follows:

wherein B is defined as

，

，

，

，

Wherein

、

、

and

as a function of the coefficients to be found,

to follow the error term of white noise and obey a mean of 0 and a variance of

The normal distribution of (a) is,

represents

A time period;

when returning the passenger flow

Inbound traffic volume when acting as covariates

And return passenger flow

The following relationships exist:

wherein,

in order to be the regression coefficient, the method,

is composed of

In the middle week v, when the site s is known, the time is obtained by taking 1. cndot. t;

obeying ARIMA (P, D, Q) (P, D, Q) [ s ]]A model representing the passenger flow except for the return passenger flow in the total inbound passenger flow; according to station history

And

calculate out

And

to obtain

Back pass formula

The prediction is obtained

Then according to

To obtain

Wherein

is formed by

In the middle week v, when the station s is known, the time is obtained by taking t + 1; will be provided with

And

bringing in

In the formula, the prediction is obtained

The amount of passengers arriving at the station at that moment.

In order to achieve the above object, the present invention provides a rail transit passenger flow prediction system based on passenger travel information, comprising:

the index acquisition module is used for acquiring passenger travel data, establishing a passenger travel information index system based on the passenger travel data, and counting and calculating each index in the passenger travel information index system;

the return passenger flow calculation module is used for estimating the return passenger flow at different time intervals in the station based on part of index data calculated in the passenger travel information index system;

and the passenger flow prediction module is used for adding the return passenger flow of passengers in the station into the passenger flow prediction model as a covariate to predict the station entering passenger flow.

In order to achieve the above object, the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and running on the processor, wherein the computer program, when executed by the processor, implements the method for predicting rail transit passenger flow based on passenger travel information as described in any one of the above.

To achieve the above object, the present invention provides a computer-readable storage medium storing thereon a computer program, which when executed by a processor, implements a method for predicting rail transit passenger flow based on passenger travel information as described in any one of the above.

The invention has the beneficial effects that:

1. the method for predicting the passenger flow of the rail transit based on the passenger travel information provided by the invention is based on intelligent subway construction, effectively associates, fuses and introduces multi-source data related to the subway, establishes the passenger travel information and discusses the application of the passenger travel information in the aspect of passenger flow prediction;

2. according to the rail transit passenger flow prediction method based on passenger travel information, the travel rule of passengers is mined according to multi-source travel data of the passengers, a passenger travel information three-level index system is established on the basis, and statistical calculation of each index is realized;

3. the invention discloses a rail transit passenger flow prediction method based on passenger travel information, and provides a prediction method for identifying return passenger flow facing the passenger travel information and effectively improving the accuracy of station arrival passenger flow according to the return passenger flow.

Drawings

Fig. 1 schematically shows a flow chart of a method for predicting rail transit passenger flow based on passenger travel information according to the present invention;

fig. 2 is a block diagram schematically showing the structure of a rail transit passenger flow prediction system based on passenger travel information according to the present invention.

Detailed Description

The content of the invention will now be discussed with reference to exemplary embodiments. It is to be understood that the embodiments discussed are merely intended to enable one of ordinary skill in the art to better understand and thus implement the teachings of the present invention, and do not imply any limitations on the scope of the invention.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".

Fig. 1 schematically shows a flow chart of a method for predicting rail transit passenger flow based on passenger travel information according to the present invention. As shown in fig. 1, the method for predicting rail transit passenger flow based on passenger travel information according to the present invention comprises the following steps:

a. passenger travel data are acquired, a passenger travel information index system is established based on the passenger travel data, and all indexes in the passenger travel information index system are calculated in a statistical mode;

b. estimating the return passenger flow of different time periods in the station based on partial index data obtained by calculation in the passenger travel information index system;

c. and taking the return passenger flow volume of passengers in the station as a covariate to be added into the passenger flow prediction model to predict the station entering passenger flow.

According to an embodiment of the invention, in the step a, when a passenger gets on the station and rides the vehicle, the trip data of the relevant passenger can be obtained through the card swiping record and/or the code scanning record of the passenger, and based on pycharm software, the data is obtained by connecting a python language with a database of the trip information of the passenger, and corresponding statistical analysis is carried out. In this embodiment, the trip data includes: AFC card swiping record and APP code scanning record for acquiring the passenger entrance and exit time and entrance and exit station information; APP registration data used for obtaining identity information and associated information of passengers; APP value-added consumption data used for obtaining value-added service information of passengers; and POI data, associated with the bus stop, near the bus stop for describing geographic attributes of the bus stop. In the embodiment, the statistical range of the travel data is counted from the date of APP registration of the passenger, the statistical range of the POI data near the station is a land use type within a coverage area with the station as the center and the radius of 500 meters, and the data is acquired through a third-party map software high-grade map.

In this embodiment, the passenger travel information index system is a concept representing passenger information, and includes three primary indexes: basic information, service information, and derived information. Wherein, the basic information comprises two-level indexes: identity information and associated information, the identity information includes three levels of indicators: APPID, sex, age and whether the passenger is disabled, and the associated information comprises three-level indexes: the third party payment mode of the passenger and the city card.

The service information comprises two-level indexes: travel basic information, travel derived information and value added service information. The travel basic information comprises three levels of indexes: passenger entry and exit time and passenger entry and exit station information; the travel derivative information comprises three levels of indexes: average trip duration, total trip times, daily average trip times, trip time distribution, trip OD distribution, trip path distribution, first trip time, last trip time, holiday trip time distribution and holiday trip OD distribution; the value added service information comprises three levels of indexes: the number of times of participation of the value added service, the participation frequency, the average transaction amount, the payment mode distribution, the merchant type distribution and the final participation time.

The derived information includes secondary indicators: an active attribute and a functional attribute. Wherein the activity attribute comprises three levels of indexes: trip liveness; the functional attributes include three levels of indicators: passenger travel demand type, residential area site, work area site, and value added participation.

Further, in the present embodiment, the statistically calculating each index in the passenger travel information index system includes: counting all indexes in the identity information, the correlation information and the trip basic information of the passenger, counting total trip times, trip OD distribution, holiday trip OD distribution and trip path distribution indexes in the trip derivative information, and counting participation times, merchant type distribution, payment mode distribution and final participation time indexes in the value-added service information. In the present invention, there is no special method for statistics, as long as the relevant index information can be acquired and the acquired relevant index information is stored in a summary manner. As is clear from the above, the indexes are data information that can be generated by the actions of the passenger such as registration and travel, and are information that can be obtained without calculation, and therefore, only summary statistics are required, and calculation is not required.

Further, in the embodiment, the statistical calculation of each index in the passenger travel information index system further includes calculating three levels of indexes except the indexes, and the specific calculation includes:

calculating the average travel time length, wherein the average travel time length refers to the time spent by the passenger i for each travel, and the calculation formula is as follows:

calculating the average daily trip times, wherein the average daily trip times are the trip times of the passenger i each day, and the calculation formula is as follows:

calculating travel time distribution, wherein the travel time distribution refers to the ratio of the travel times of the passengers i in the early peak (7: 00-9: 00), the late peak (17: 00-19: 00) and the average peak to the total travel times, and taking the early peak as an example, the calculation formula is as follows:

the statistical travel OD distribution is travel OD statistics of the first three travel frequencies of the passengers;

the formula for calculating the first trip time is as follows:

calculating the last travel time, which is the time of the last travel of the passenger i within the statistical date, and is used for judging the activity of the passenger (i.e. the travel activity in the derivative information), wherein the calculation formula is as follows:

calculating the travel time distribution of the holidays, wherein the travel time distribution of the holidays refers to the ratio of the travel times of the passengers in the early peak (7: 00-9: 00), the late peak (17: 00-19: 00) and the average peak to the total travel times of the holidays in the holidays, and taking the early peak as an example, the calculation formula is as follows:

counting the holiday travel OD distribution, namely counting the travel OD of the passenger three times before holiday travel frequency;

in the above-mentioned formulas, the first and second substrates,

represents the first

represents passenger i at time t

The time of the next outbound is,

represents the passenger i at the time t

The time of the next arrival is the time of the next arrival,

representing the average travel time period of the passenger i,

representing the total number of historical trips of passenger i,

representing the time of the first trip of passenger i,

representing the last travel time of the passenger i,

Calculating participation frequency, wherein the participation frequency is the frequency degree of the passenger i participating in the value-added service, and the calculation formula is as follows:

wherein,

the number of times the passenger i participates in the value added service;

counting the merchant type distribution, namely counting the merchant types of the first three times of the participation frequency of the passengers;

the formula for calculating the average transaction amount is:

wherein,

the statistical payment mode distribution is the statistics of the first three usage modes when passengers pay.

The travel demand type is determined by the clustering result of the total travel times, the first travel time and the average travel time counted by the passenger station-entering card-swiping data, and the clustering method comprises the following steps:

First trip time

And average trip duration

As an index of passenger clustering in a station, the determination of a clustering number K value adopts an elbow methodThe key index of the elbow method is the square sum of errors SSE between clusters, namely the square sum of errors, and the calculation formula is as follows:

wherein,

represents the number k of the clusters and represents the number k of clusters,

is that

A center point of (a);

in this embodiment, the travel demand type is a clustering result of three indexes counted by card swiping data of passengers entering a station, the clustering number is obtained according to an SSE formula, and for each class of passengers, the historical travel times and the first travel time are analyzed, for example, the historical travel times account for a large proportion of statistical days, and the first travel time is in an early peak time period, so that the class of passengers can be considered as commuting passengers. Depending on the specific clustering result.

Example (c): the method comprises the steps of taking passengers in a subway station as research objects, selecting AFC data of three working days including 6 months in 2018, 6 days in 7 days in 8 days as basic data, and analyzing travel behavior characteristics of the passengers on the working days in the station. After data screening, the number of people entering the station for three working days is 197328.

The passengers were classified into 5 categories in total by the K-means clustering method. Table 1 below is the cluster center points for the five classes.

TABLE 1

And (3) analyzing a clustering result:

the proportion of the first class of passengers is 21.2%, the travel characteristics are that the travel times in three days are 1.75, the passengers are the class with the highest travel intensity in five classes, and the first-time travel time is 08: 22: 13, the average travel time is 27.7min, the travel distance is not far away, the time period of the early peak is met, and the class of passengers can be considered as standard commuter passengers in the early peak period.

The proportion of the second class of passengers is 10.2%, the travel characteristic is that the travel times in three days are 1.34, the travel intensity is general, and the first travel time is 11: 29: 33, the average travel time is 48.1min, the travel distance is far away, the occupied ratio is less, the passenger can be seen as a passenger who goes out for travel or travels in a long distance, and by combining POI data, more bus stations and railway stations are arranged near the stations, especially Beijing northern railway stations, and the passenger can conveniently travel for travel.

The proportion of the third class of passengers is 34.5%, the travel characteristic shows that the travel times in three days are 1.69, the travel intensity is higher, and the first travel time is 17: 39: 14, the average travel time is 37.9min, and the trip distance is moderate than other classes, accords with the time quantum of late peak, can regard as this type of passenger as the commuter passenger of the late peak period of standard, and this type of passenger accounts for than the highest class in five types of passengers simultaneously, explains that west straight gate station late peak arrival number is many, combines POI data, has more office areas near the station, explains that this explanation is reasonable.

The proportion of the fourth class passenger is 17.2%, and the trip characteristic shows that the number of times of trip in three days is 1.22, and the intensity of trip is minimum, and it is not high to show this class of passenger's trip loyalty, and the time of trip for the first time is 20: 39: 40, the average travel time is 37.1min, the travel distance is moderate compared with other classes, the travel time is later, the passenger can be regarded as a living class passenger, and by combining POI data, a plurality of shopping and catering merchants are nearby the station, and the travel can be regarded as the travel of the passenger going home after consumption.

The proportion of the fifth class of passengers is 17.1%, the travel characteristic is that the travel frequency in three days is 1.25, the travel intensity is low, and the first travel time is 14: 05: 07, the average travel time is 29.2min, the travel distance is short, the travel time is the same as that of the fourth class of passengers, no obvious characteristics exist, the proportion of the travel time is very close to that of the fourth class of passengers, and the passengers can be regarded as life passengers. Judging the residential area site, and in the working day, at noon 12: 00 as a demarcation point, 16 pm after the holiday: 00 as a demarcation point, the statistics of the number of times of passengers to get in and out of the station in the corresponding time interval is shown in the following table 2:

TABLE 2

The station stations where the passenger living areas are located are generally the station where the passenger first travels and the destination station where the passenger last travels in one day, so the probability calculation formula of the station e as the station of the passenger i living area is as follows:

，

；

wherein,

representing the probability that station e is the station of the residential area of passenger i;

judging work area stations, the station stations where passenger work areas are typically passengers 12 within a work day: station before 00 as destination and 12: after 00, the station is used as an initial station, so the calculation formula of the station e as the station of the working area of the passenger i is as follows:

，

；

wherein,

In addition, in the present embodiment, a specific format of a part of the index labels in the passenger travel information index system is shown in table 3 below:

TABLE 3

In the present embodiment, the index data in the passenger travel information index system should be updated continuously according to the travel of the passenger. The updating rule of the passenger trip information is as follows: for basic information, updating is performed only when the passenger modifies his personal information; for the service information, the travel basic information, the travel derivative information and the value-added service information are updated in real time along with the travel of the passenger and the use of the value-added service every time, and meanwhile, the service information in the redis is synchronously updated into a database every month; for the derivative information, analyzing the basic information and the service information, and updating once every month; judging the last trip time of the passenger every half year, if the difference between the last trip time of the passenger and the updating time exceeds half year, judging the passenger as an inactive user, and deleting the passenger trip information from the database.

According to an embodiment of the present invention, in step a, after the passenger travel data is acquired, the method further includes preprocessing the travel data, and specifically includes:

redundant data processing: when a passenger swipes a card for many times or equipment fails, data repetition may occur, and the repeated data needs to be deleted;

and (3) error data processing: abnormal data may occur due to passenger behavior and equipment failure. There are three criteria for the determination of abnormal data: firstly, the arrival time of passengers is required to be earlier than the departure time; secondly, the stay time of passengers in the rail transit is regulated to be less than 4 hours; and thirdly, judging the times of the passengers entering the same station within one day, and counting the staff when the statistical data is eliminated because the staff at the station has more access times in one day.

The passenger categories contained in the travel demand types of the third-level indexes in the passenger travel information index system include:

commuting passengers: the trip time and the trip frequency of commuter passengers are relatively fixed due to the working requirements;

touring passengers: the traveling time and the traveling frequency of the passengers are high in fluctuation, the traveling frequency in a short time is high, and the traveling ODs are widely distributed;

leisure entertainment passengers: the travel time of the class of passengers is more distributed on weekends and off-peak time periods of each day;

special passengers: for example, the old, the disabled, the pregnant woman and the like often need external help in the traveling process due to the self-reason, and the information needs to be provided by the passenger when registering the APP account;

the other passengers: other passengers are different from the four passenger types, the travel time and the travel frequency are not determined, and the travel purposes are also various.

According to an embodiment of the present invention, in the step b, estimating the return passenger flow volume in different time periods in the station based on the passenger travel information index system and the calculated partial index data, includes:

according to the in-out time, the residence area station, the working area station and the travel demand type in the passenger travel information index system, counting

Wherein s is a certain site, v is a certain week, the value range is 1-7, which means Monday to Sunday, t is a certain time period,

the number of people returning from station s within a time period t of v weeks;

Arrival of time periodsStand and are arranged in

Time period fromsConditional probability distribution of station departure return

The calculation formula is as follows:

in the formula

Represents the total number of the weeks,

which represents the time period a of time,

which represents the time period of b,

is shown asjOn the v-th day of the week

In a time period ofsThe number of passengers getting off the vehicle at a station,

which represents the time of arrival of the station,

represents the time of outbound;

by probability distribution

Estimating

The calculation formula is as follows:

wherein,

indicating day v

According to an embodiment of the present invention, in the step c, the return passenger flow volume of passengers in the station is added as a covariate to the passenger flow prediction model, and the predicted station passenger flow entering the station is:

to be estimated

Adding the predicted traffic volume into a common seasonal autoregressive moving average model (S-ARIMA model) to predict the station entrance traffic volume;

the S-ARIMA model is: ARIMA (P, D, Q) (P, D, Q) [ omega ], where P, D, Q represent the order of autoregressive, differential and moving average, respectively P, D, Q is the order of autoregressive, differential and moving average for part of the season;

for a time series

，ARIMA(p,d,q)(P,D,Q)[Ω]The model is as follows:

wherein B is defined as

，

，

，

，

In which

、

、

And

for the coefficients to be found, the coefficients are,

to follow the error term of white noise and obey a mean of 0 and a variance of

The normal distribution of (a) is,

represents

A time period;

while returning passenger flow

Inbound traffic volume when acting as covariates

And return passenger flow

The following relationships exist:

wherein,

in order to be the regression coefficient, the method,

is formed by

obeying ARIMA (P, D, Q) (P, D, Q) [ omega ]]The model represents the passenger flow except the return passenger flow in the total inbound passenger flow; according to station history

And

calculate out

And

to obtain

Post-pass formula

The prediction is obtained

Then according to

To obtain

Wherein, in the process,

is formed by

In the middle week v, when the station s is known, the time is obtained by taking t + 1; due to the fact that

Obeying ARIMA (P, D, Q) (P, D, Q) [ omega ]]Models, i.e. by which prediction can be made

Of a time period

To do so

Is as in the above

(ii) a Will be provided with

And

substituted into a formula, the prediction being obtained

The amount of passengers arriving at the station at that moment.

In the present embodiment, for example, the model parameters are selected to be (2, 0, 1) (1, 1, 0) [72]The experimental results are shown in Table 4 below, where no M0 model was added

M1 model addition

As covariates, the RMSE of the training set is reduced by 9.87, the RMSE of the test set is reduced by 9.02, the SMAPE of the training set is reduced by 0.64%, the SMAPE of the test set is reduced by 0.16%, and the predicted effect is more accurate after new variables are added.

TABLE 4

According to the scheme, the method provided by the invention is based on intelligent subway construction, effectively associates, fuses and introduces multi-source data related to the subway, establishes passenger travel information, and discusses the application of the passenger travel information in the aspect of passenger flow prediction. According to the invention, the travel rule of the passenger is mined according to the multi-source travel data of the passenger, and a passenger travel information three-level index system is established on the basis, so that the statistical calculation of each index is realized. Meanwhile, a prediction method for identifying the return passenger flow facing to the passenger travel information and effectively improving the accuracy of the station arrival passenger flow according to the return passenger flow is provided.

Further, in order to achieve the above object, the present invention further provides a system for predicting rail transit passenger flow based on passenger travel information, and a block diagram of the system structure is shown in fig. 2, and specifically includes:

According to one embodiment of the invention, in the index acquisition module, when a passenger gets into a station and takes a bus, travel data of the relevant passenger can be acquired through a card swiping record and/or a code scanning record of the passenger, and based on pycharm software, a python language is used to connect a database of travel information of the passenger, acquire the data and perform corresponding statistical analysis. In this embodiment, the trip data includes: AFC card swiping record and APP code scanning record for acquiring the passenger entrance and exit time and entrance and exit station information; APP registration data used for obtaining identity information and associated information of passengers; APP value-added consumption data used for obtaining value-added service information of passengers; and POI data, associated with the bus stop, near the bus stop for describing geographic attributes of the bus stop. In the embodiment, the statistical range of the travel data is counted from the date of the passenger registering the APP, the statistical range of the POI data near the station is a land use type within the coverage range of a radius of 500 meters centering on the station, and the data is acquired through a third-party map software high level map.

The service information comprises two-level indexes: travel basic information, travel derived information and value added service information. The travel basic information comprises three levels of indexes: passenger's entering and exiting time and passenger's entering and exiting station information; the travel derivative information comprises three levels of indexes: average trip duration, total trip times, daily average trip times, trip time distribution, trip OD distribution, trip path distribution, first trip time, last trip time, holiday trip time distribution and holiday trip OD distribution; the value added service information comprises three levels of indexes: number of value added service participation, participation frequency, average transaction amount, payment mode distribution, merchant type distribution and final participation time.

The derived information includes secondary indicators: an active attribute and a functional attribute. Wherein the activity attribute comprises three levels of indexes: trip liveness; the functional attributes comprise three levels of indexes: passenger travel demand type, residential area sites, work area sites, and value-added participation.

Further, in the present embodiment, the statistically calculating each index in the passenger travel information index system includes: the method comprises the steps of counting all indexes in identity information, correlation information and travel basic information of passengers, counting total travel times, travel OD distribution, holiday travel OD distribution and travel path distribution indexes in travel derivative information, and counting participation times, merchant type distribution, payment mode distribution and final participation time indexes in value-added service information. In the present invention, there is no special method for statistics, as long as the relevant index information can be acquired and the acquired relevant index information is stored in a summary manner. As is clear from the above, the above-described indexes are data information that can be generated by the actions of the passenger such as registration and travel, and are information that can be obtained without calculation, and therefore, only summary statistics are required, and calculation is not required.

Further, in this embodiment, the statistically calculating each index in the passenger travel information index system further includes calculating three-level indexes other than the above indexes, and the specific calculation includes:

calculating the average travel time length, wherein the average travel time length refers to the time spent by the passenger i in each travel, and the calculation formula is as follows:

wherein,

representing the average trip times of the passenger i, and D represents the total days of the passenger within the statistical date;

the formula for calculating the first trip time is as follows:

calculating the last trip time, which is the time of the last trip of the passenger i in the statistical date, and is used for judging the activity of the passenger (i.e. the trip activity in the derivative information), wherein the calculation formula is as follows:

in the above-mentioned respective formulas, the first and second,

represents the first

represents passenger i at time t

The time of the next outbound is the time of the next outbound,

represents the passenger i at the time t

The time of the next arrival is the time of the next arrival,

represents the average travel time period of the passenger i,

representing the total number of historical trips of passenger i,

representing the time of the first trip of passenger i,

representing the last travel time of the passenger i,

and counting the total number of trips of the passengers on the holidays within the statistical date.

wherein,

the number of times the passenger i participates in the value added service;

counting merchant type distribution, namely counting merchant types of the first three passenger participation frequencies;

the formula for calculating the average transaction amount is:

wherein,

First trip time

And average length of trip

As an index of passenger clustering in a station, determining a clustering number K value by adopting an elbow method, wherein a key index of the elbow method is an inter-cluster error Sum of Squares (SSE), namely an error sum of squares, and a calculation formula is as follows:

wherein,

is that

A center point of (a);

Example (c): taking passengers in a subway station as research objects, selecting AFC data of three working days of 6 months, 7 days and 8 days in 2018 as basic data, and analyzing the traveling behavior characteristics of the passengers on the working days in the station. After data screening, the number of people entering the station for three working days is 197328.

The passengers were classified into 5 categories in total by the K-means clustering method.

As can be seen from table 1 above, the clustering results were analyzed:

the proportion of the first class of passengers is 21.2%, the travel characteristics are that the travel times within three days are 1.75, the passengers are the class with the highest travel intensity in five classes, and the first-time travel time is 08: 22: 13, the average travel time is 27.7min, the travel distance is not far away, the time period of the early peak is met, and the class of passengers can be considered as standard commuter passengers in the early peak period.

The proportion of the second class of passengers is 10.2%, the travel characteristic is that the travel times in three days are 1.34, the travel intensity is general, and the first travel time is 11: 29: 33, the average travel time is 48.1min, the travel distance is far, the occupied ratio is small, the passenger can be seen as a passenger who travels outside or travels in a long distance, and by combining POI data, the number of bus stations and railway stations near the stations is large, particularly Beijing railway stations, so that the passenger can conveniently travel.

The proportion of the third class of passengers is 34.5%, the travel characteristic shows that the travel times in three days are 1.69, the travel intensity is higher, and the first travel time is 17: 39: 14, the average travel time is 37.9min, the travel distance is moderate compared with other classes, the time period of the late peak is met, the class of passengers can be considered as commuter passengers in the standard late peak period, meanwhile, the class of passengers is the class with the highest proportion among five classes of passengers, the number of people who enter the station at the late peak of the west vertical gate station is large, and by combining POI data, more office areas are arranged near the station, and the explanation is reasonable.

The proportion of the fourth class passenger is 17.2%, and the trip characteristic shows that the number of times of trip in three days is 1.22, and the intensity of trip is minimum, and it is not high to show this class of passenger's trip loyalty, and the time of trip for the first time is 20: 39: 40, the average travel time is 37.1min, the travel distance is moderate compared with other classes, the travel time is later, the passenger can be regarded as a living class passenger, and by combining POI data, a plurality of shopping and catering merchants are arranged near the station, and the travel can be regarded as the travel of the passenger going home after consumption.

The proportion of the fifth class of passengers is 17.1%, the travel characteristic is that the travel frequency in three days is 1.25, the travel intensity is low, and the first travel time is 14: 05: 07, the average travel time is 29.2min, the travel distance is short, the travel time is the same as that of the fourth class of passengers, no obvious characteristics exist, the proportion of the travel time is very close to that of the fourth class of passengers, and the passengers can be regarded as life passengers.

Judging the residential area site, and in the working day, at noon 12: 00 as a demarcation point, 16 pm on holidays: 00 as a demarcation point, counting the number of times of passengers to get in and out of the station in the corresponding time interval as shown in the table 2.

，

；

wherein,

，

；

wherein,

In the present embodiment, the specific format of the part of the index labels in the passenger travel information index system is as shown in table 3 above.

In the present embodiment, the index data in the passenger travel information index system should be continuously updated according to the travel of the passenger. The updating rule of the passenger trip information is as follows: for basic information, updating is performed only when the passenger modifies his personal information; for the service information, the travel basic information, the travel derivative information and the value-added service information are updated in real time along with the travel of the passenger and the use of the value-added service every time, and meanwhile, the service information in the redis is synchronously updated into a database every month; for the derived information, analyzing basic information and service information, and updating once a month; judging the last trip time of the passenger every half year, if the difference between the last trip time of the passenger and the updating time exceeds half year, judging the passenger as an inactive user, and deleting the passenger trip information from the database.

According to an embodiment of the present invention, after the passenger trip data is acquired in the index acquisition module, the method further includes preprocessing the trip data, and specifically includes:

The passenger categories included in the travel demand types of the third-level indexes in the passenger travel information index system include:

touring passengers: the traveling time and the traveling frequency of the passengers are high in fluctuation, the traveling frequency in a short time is high, and the traveling OD distribution is wide;

According to an embodiment of the present invention, in the return passenger flow calculation module, estimating the return passenger flow volume at different time intervals in the station based on the passenger travel information index system and part of the index data obtained by calculation includes:

Wherein s is a certain site, v is a certain week, the value range is 1-7, which represents Monday to Sunday, t is a certain time period,

Arrival of time periodsStand and are arranged in

The calculation formula is as follows:

in the formula

Represents the total number of the weeks,

which represents the time period a of time,

which represents the time period of b,

is shown asjOn the v-th day of the week

In the time period ofsThe number of passengers getting off the vehicle when standing,

which represents the time of arrival of the station,

represents the outbound time;

by probability distribution

Estimating

The calculation formula is as follows:

wherein,

indicating day v

According to an embodiment of the present invention, in the passenger flow prediction module, the return passenger flow volume of passengers in the station is added as a covariate to the passenger flow prediction model, and the prediction of the station passenger flow into the station is:

to be estimated

for a time series

，ARIMA(p,d,q)(P,D,Q)[Ω]The model is as follows:

wherein B is defined as

，

，

，

，

Wherein, in the process,

、

、

and

as a function of the coefficients to be found,

to follow the error term of white noise, and obey a mean of 0 and a variance of

The normal distribution of (c),

represents

A time period;

when returning the passenger flow

Inbound traffic volume when acting as covariates

And return passenger flow volume

The following relationships exist:

wherein,

in order to be the regression coefficient, the method,

is formed by

In the middle week v, when the site s is known, the time is obtained by taking 1. cndot. t,

And

calculate out

And

to obtain

Post-pass formula

The prediction is obtained

Then according to

To obtain

Wherein

is composed of

Of a time period

To do so

Is as in the above

(ii) a Will be provided with

And

bringing in

In the formula, the prediction is obtained

The amount of passengers arriving at the station at that moment.

In the present embodiment, for example, the model parameters are selected to be (2, 0, 1) (1, 1, 0) [72 ]]The results are shown in Table 4 above, where no M0 model was added

M1 model addition

As covariates, new changes can be discovered, addedAfter the prediction, the RMSE of the training set is reduced by 9.87, the RMSE of the test set is reduced by 9.02, the SMAPE of the training set is reduced by 0.64%, the SMAPE of the test set is reduced by 0.16%, and the prediction effect is more accurate.

Further, to achieve the above object, the present invention also provides an electronic device, including: the system comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, the method for predicting the rail transit passenger flow based on the passenger travel information is realized.

In order to achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above method for predicting rail transit passenger flow based on passenger travel information.

According to the scheme, the method provided by the invention is based on intelligent subway construction, effectively associates, fuses and introduces multi-source data related to the subway, establishes passenger travel information, and discusses the application of the passenger travel information in the aspect of passenger flow prediction. According to the invention, the travel rule of the passenger is mined according to the multisource travel data of the passenger, and on the basis, a passenger travel information three-level index system is established to realize the statistical calculation of each index. Meanwhile, a prediction method for identifying the return passenger flow facing to the passenger travel information and effectively improving the accuracy of the station arrival passenger flow according to the return passenger flow is provided.

Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims

1. The rail transit passenger flow prediction method based on passenger travel information is characterized by comprising the following steps of:

passenger travel data are acquired, a passenger travel information index system is established based on the passenger travel data, and each index information in the passenger travel information index system is calculated in a statistical mode;

taking the return passenger flow of passengers in the station as a covariate, adding the covariate into the seasonal autoregressive moving average model, and predicting the station passenger flow;

the passenger trip data includes:

POI data, associated with the bus stop, near the bus stop for describing geographic attributes of the bus stop;

the index information in the passenger travel information index system comprises the following steps: basic information, business information and derived information;

the derived information comprises an active attribute and a functional attribute, wherein the active attribute comprises travel liveness, and the functional attribute comprises a travel demand type of a passenger, a residence area site, a working area site and value-added participation;

the partial index data includes the in-and-out time, the residential zone site, the work zone site, and the travel demand type.

2. The passenger travel information-based rail transit passenger flow prediction method according to claim 1, wherein the formula for calculating the average travel time length is as follows:

the formula for calculating the average daily trip times is as follows:

the formula for calculating the travel time distribution is as follows:

the formula for calculating the first trip time is as follows:

the formula for calculating the last travel time is as follows:

in the following formulas, the first and second groups,

represents the first

represents the passenger i at the time t

The time of the next outbound is,

represents the passenger i at the time t

The time of the next arrival is the time of the next arrival,

represents the average travel time period of the passenger i,

representing the total number of historical trips of passenger i,

representing the time of the first trip of passenger i,

representing the last travel time of the passenger i,

3. The passenger travel information-based rail transit passenger flow prediction method according to claim 2, wherein the formula for calculating the participation frequency is:

wherein,

number of times passenger i participates in value added service;

the formula for calculating the average transaction amount is:

wherein,

and counting the payment mode distribution as the statistics of the first three usage modes when the passengers pay.

4. The passenger travel information-based rail transit passenger flow prediction method according to claim 3, wherein the travel demand type is determined by a clustering result of a total travel frequency, a first travel time and an average travel time counted by passenger boarding and card swiping data, and the clustering method is as follows:

First trip time

And average trip duration

wherein,

is that

A center point of (a);

the formula for calculating the residential area site is:

，

；

the formula for calculating the work area station is as follows:

，

；

wherein,

the probability that the representative station e is used as a station of the working area of the passenger i;

representing the total times of passengers to get in and out of the station e;

representing the number of station entering times of a passenger i at a station e 12 o' clock before the working day;

representing the number of times of entering the station e before 16 resting days of passengers i;

representing the number of outbound times of the passenger i at station e before 12 o' clock on working day;

5. The passenger travel information-based rail transit passenger flow prediction method according to claim 4, wherein estimating the return passenger flow volume at different time intervals in a station based on the passenger travel information index system and part of calculated index data comprises:

Wherein s is a site, v is day v of a week, is 1-7, t is a time period,

the number of people who return from the s station within a time period t of the v day of a certain week;

Time period of arrivalsStand and are arranged in

The calculation formula is as follows:

in the formula

Represents the total number of the weeks,

which represents the time period a of the time,

which represents the time period of b,

indicates the number of passengers alighting at s-station during the period a of the jth week and the v day,

which represents the time of arrival of the station,

represents the time of outbound;

by said conditional probability distribution

Estimating

The calculation formula is as follows:

wherein,

indicating day v of a week

6. The passenger travel information-based rail transit passenger flow prediction method according to claim 5, wherein the return passenger flow volume of passengers in the station is added to the passenger flow prediction model as a covariate, and the predicted station arrival passenger flow is:

will estimate the

the seasonal autoregressive moving average model is as follows: ARIMA (P, D, Q) [ Ω ], where P, D, Q represent the order of auto-regressive, differential and moving average, respectively P, D, Q is the order of auto-regressive, differential and moving average for part of season; Ω is the number of cycles per season;

for a time series

，ARIMA(p,d,q)(P,D,Q)[Ω]The model is as follows:

wherein B is defined as

，

，

，

，

Wherein

、

、

and

as a function of the coefficients to be found,

to follow the error term of white noise and obey a mean of 0 and a variance of

The normal distribution of (c),

represents

A time period;

when returning the passenger flow

Inbound traffic volume when acting as covariates

And return passenger flow

The following relationships exist:

wherein,

in order to be the regression coefficient, the method,

is composed of

obeying ARIMA (P, D, Q) (P, D, Q) [ s ]]The model represents the passenger flow except the return passenger flow in the total inbound passenger flow; according to station history

And

calculate out

And

to obtain

Post-pass formula

The prediction is obtained

Then according to

To obtain

Wherein

is formed by

Middle week v, stationWhen the point s is known, the time is obtained by taking t + 1; will be provided with

And

bringing in

In the formula, the prediction is obtained

The incoming passenger flow at that moment.

7. Rail transit passenger flow prediction system based on passenger's trip information, its characterized in that includes:

the passenger flow prediction module is used for adding return passenger flow of passengers in the station as a covariate into the seasonal autoregressive moving average model to predict the station entering passenger flow;

the passenger trip data includes:

the portion of the index data includes the inbound and outbound times, the residential site, the work site, and the travel demand type.

8. An electronic device comprising a processor, a memory, and a computer program stored on the memory and operable on the processor, the computer program when executed by the processor implementing a method of rail transit passenger flow prediction based on passenger travel information according to any one of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for predicting rail transit passenger flow based on passenger travel information according to any one of claims 1 to 6.