CN110119884B - High-speed railway passenger flow time interval division method based on neighbor propagation clustering - Google Patents

High-speed railway passenger flow time interval division method based on neighbor propagation clustering Download PDF

Info

Publication number
CN110119884B
CN110119884B CN201910307332.7A CN201910307332A CN110119884B CN 110119884 B CN110119884 B CN 110119884B CN 201910307332 A CN201910307332 A CN 201910307332A CN 110119884 B CN110119884 B CN 110119884B
Authority
CN
China
Prior art keywords
passenger flow
time
passenger
sample
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910307332.7A
Other languages
Chinese (zh)
Other versions
CN110119884A (en
Inventor
王文宪
肖蒙
翟玉江
林群煦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuyi University
Original Assignee
Wuyi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuyi University filed Critical Wuyi University
Priority to CN201910307332.7A priority Critical patent/CN110119884B/en
Publication of CN110119884A publication Critical patent/CN110119884A/en
Application granted granted Critical
Publication of CN110119884B publication Critical patent/CN110119884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • G06Q50/40

Abstract

The invention provides a high-speed railway passenger flow time interval division method based on neighbor propagation clustering, which divides statistical time into a plurality of time points, then counts passenger flow data of each time point, and constructs a time sample sequence of preprocessed sample variables; then, dividing the sample sequence by utilizing a neighbor propagation clustering algorithm; finally, determining an optimal clustering result by adopting clustering effectiveness indexes such as CH, Hartigan and IGP, and further forming a annual operation time interval division result; meanwhile, the method can objectively and accurately reflect passenger flow requirements at different time intervals in the year, and overcomes the defects of low subjectivity, low efficiency and low precision of a manual dividing method, thereby laying a foundation for the adaptive adjustment of a train operation scheme.

Description

High-speed railway passenger flow time interval division method based on neighbor propagation clustering
Technical Field
The invention relates to the technical field of high-speed railways, in particular to a high-speed railway passenger flow time interval division method based on neighbor propagation clustering.
Background
The eight-transverse eight-longitudinal highway network and the intercity railway network are gradually improved, so that more and more medium and long-distance passengers take the high-speed railway as a preferred travel mode and serve as important influence factors of passenger service quality, and the train operation scheme specifies the operation number, sections, stop stations and the like of passenger trains. In order to improve the service quality of each going passenger flow on a road network and reduce the train running cost as much as possible, a high-speed railway passenger transport management department needs to adaptively adjust a train running scheme according to annual passenger flow fluctuation changes so as to meet the passenger flow requirements at different time intervals in the year.
The passenger flow distribution condition on the train in the railway network is an important basis for evaluating the implementation efficiency of the passenger train operation scheme. The actual passenger flow distribution is usually adopted to evaluate the running scheme of the passenger train being implemented, but for the passenger train running scheme to be optimally designed, the passenger flow distribution on the train can be generated only by means of a passenger flow distribution means to evaluate. Because the passenger flow distribution efficiency and the reasonability of the result directly influence the optimization level of the passenger train operation scheme, the train passenger flow distribution method is one of important basic research subjects for researching the optimization of the railway passenger train operation scheme.
However, the adjustment of the train operation scheme involves many factors, is a complex and huge system project, and has a limited number of times of adjustment every year. Time interval division is carried out on the operation year of the high-speed railway according to passenger flow fluctuation characteristics, and then adjustment of a train operation scheme according to the passenger flow of each time interval is a feasible strategy. Therefore, the scientific and reasonable division of the operation time period is a basic premise and an important basis for the adjustment of the train operation scheme and an important guarantee that the adjustment of the train operation scheme is adaptive to the passenger flow requirement with the dynamic characteristic. The existing high-speed railway operation time interval dividing method is that the annual operation time interval is divided into a spring operation period, a summer operation period, a holiday period and a peak-leveling period according to the change situation of the total passenger flow counted by target lines all year round. Although the method reflects the difference of the passenger flow volume among different periods, the result of the period division depends on the experience of field engineering technicians to a great extent, and the method has the defects of strong subjectivity and easiness in causing unreasonable period division results, and is difficult to accurately reflect the passenger flow demand with seasonal change characteristics within the year.
For the problem of time interval division of high-speed railway operation, no relevant research is available at home and abroad. This problem is similar in nature to traffic segment division in multi-segment control (TOD) based intersection signal design. Aiming at the problem of multi-time interval control (TOD) of a road intersection, scholars at home and abroad have some related researches, the reasonable division of traffic time intervals is realized mainly by drawing a one-day accumulated traffic curve of a certain representative intersection and determining a time node with obvious traffic curve change as a time interval division point through manual experience, and an earlier train passenger flow distribution method has almost no independent research and is usually applied to optimization research of train operation schemes, the researches construct a passenger transfer network based on a given train operation scheme, design the travel generalized expenses of passengers, including fare expenditure, travel time, congestion effect and the like, establish a static user balanced distribution model or a random user balanced distribution model, and distribute flow to train operation sections in a balanced manner (refer to railway related passenger train operation scheme research [ J ] of railway school newspaper, 2004,26(2):16-20.). Urban mass transit passenger flow distribution is very similar to high-speed rail passenger flow distribution, and there is a great deal of research in this field. The method mainly considers the capacity constraint and the space-time priority characteristic in the passenger flow distribution process, does not give research and analysis to the passenger ticket purchasing characteristic, but has important influence in the travel selection of passengers on a high-speed railway, so that a passenger flow distribution method suitable for a high-speed railway transportation network needs to be designed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a high-speed railway passenger flow time interval division method based on neighbor propagation clustering, which reasonably divides the whole year according to the change characteristics of the destination passenger flow on a high-speed railway line and improves the adaptability of a train operation scheme and the passenger flow demand.
The technical scheme of the invention is as follows: a high-speed railway passenger flow time interval division method based on neighbor propagation clustering comprises the following steps:
s1), dividing one year into T time points, setting the high-speed railway line X to comprise n important stations, and counting each time interval T in the ascending or descending direction of the high-speed railway line X k Traffic of individual stations within, i.e.
Figure BDA0002030304960000021
Wherein, X k Representing a time interval t k The passenger traffic volume of the interior high-speed rail line X,
Figure BDA0002030304960000022
representing a time interval t k Passenger flow sending volume of the nth station of the inner high-speed rail line X;
s2), judging whether the passenger flow sending volume at each time point is abnormal or not by adopting a threshold value delta, and specifically counting the time point t l The passenger flow sending vectors of the adjacent m time points are calculated, and the average value of the passenger flow sending vectors is calculated
Figure BDA0002030304960000023
If it is not
Figure BDA0002030304960000024
The time point t is considered l The passenger flow sending volume data is non-abnormal data, otherwise, the passenger flow sending volume data is abnormal data;
wherein X l At a time point t l The passenger flow sending volume of (2);
if the data is abnormal, deleting the abnormal data, and fitting and repairing the passenger flow sending volume by using the passenger flow data of m adjacent time points at the time point, wherein the calculation formula is as follows:
Figure BDA0002030304960000031
Figure BDA0002030304960000032
in the formula I k (t) is a k-1 order fitting polynomial, L n (t) is a Lagrange interpolation polynomial, t is a time point to be fitted, t is j At jth time point, t i Is the ith time point, X i As the traffic matrix at the ith time point, l i (t) is a fitting polynomial of degree i-1;
merging the fitted and repaired data with the original normal data, and then normalizing by using a standard deviation to eliminate the difference of the scales between the variables, wherein the calculation formula is as follows:
Figure BDA0002030304960000033
wherein Z-score is a normalized value of standard deviation, x is a station passenger transmission amount at a certain time point,
Figure BDA0002030304960000034
the mean value of passenger sending quantity of all stations in the year, and sigma is the standard deviation of passenger sending quantity of all stations in the year;
s3), time interval division is carried out based on a neighbor propagation clustering algorithm, a passenger flow sending volume data set of T time points in the passenger flow sending volume data set is used as a candidate class representative, and the similarity S (i, k) of the passenger flow sending volume in any 2 time intervals is judged, namely S (i, k) represents a passenger flow sending volume sample X of the time interval k k Sample X of passenger flow sending amount in interval i i Of (2), i.e. sample X k Is suitable for use as sample X i When the algorithm is initialized, it is assumed that all samples have the same probability of being represented by a class, i.e., all s (k, k) are assumed to be the same median value p of attraction, wherein the similarity calculation formula of any two samples is as follows:
s(i,k)=-||x i -x k || 2
wherein x is i Sample of passenger traffic volume, x, representing i k The passenger flow sending volume at the time k is shown;
defining a confidence matrix r and an availability matrix a, wherein the confidence matrix r (i, k) is derived from the samples x i Point to sample x k Represents a sample x k Is suitable as x i Degree of representation of the class representation of (c); a (i, k) is from sample x k Pointing to sample x i Denotes x i Selecting x k The degree of suitability as a class representative; for arbitrary sample x i Calculating the credibility r (i, k) and credibility of passenger flow sending quantity in other time intervalsSample x using the sum of the degrees a (i, k), if the sum of the two is maximum k Outputting all time point classification results for class representation;
the method specifically comprises the following steps:
s301), setting initial values of a reliability matrix r (i, k) and an availability matrix a (i, k) to be 0;
s302), calculating a similarity matrix S (i, k) of passenger flow sending volume samples at any time interval, wherein the matrix value adopts Euclidean distance as measure, namely S (i, k) ═ x i -x k || 2
Set the diagonal elements s (k, k) to the same median value of attractiveness, i.e.
Figure BDA0002030304960000035
In the formula, the number is N samples;
s303), updating the credibility matrix r (i, k) and the availability matrix a (i, k), wherein the credibility matrix r (i, k) is updated by the calculation formula:
Figure BDA0002030304960000041
the calculation formula is updated by the degree matrix a (i, k) as follows:
Figure BDA0002030304960000042
s304), setting a damping factor λ eliminates digital oscillations in the iteration, i.e.,
Figure BDA0002030304960000043
in the formula, r new (i, k) and r old (i, k) respectively obtaining credibility matrixes obtained by the updating of the current time and the last time; a is a new (i, k) and a old (i, k) respectively obtaining the availability matrixes updated this time and last time; lambda epsilon (0,1) is a damping factor;
s305), calculating the sum of the reliability and the availability of any passenger flow sending volume data sample and all passenger flow sending volume samples, and obtaining the sum according to the sum
Figure BDA0002030304960000044
Finding a class center sample of each sample;
s306), updating N ← N +1 for the current iteration times, and judging whether the information iteration process reaches the set maximum iteration times, namely N is not more than N max If yes, the algorithm is terminated, all time point category division results are output, and if not, the step S302) is returned;
s4), respectively calculating Calinski-Harabasz, Hartigan and In-Group probability indexes of different time point class division results, and selecting the optimal time point class number and the corresponding class division result;
s5), checking and correcting the division result of the operation time period, traversing and circulating all the division categories, carrying out pairwise comparison analysis on each sample, if the time points corresponding to the two samples are adjacent, combining the two samples into one operation time period, and if not, regarding the two samples as the other operation time period;
s6), passenger flow demand adaptability assessment, after the operation time period is divided, a train operation scheme is compiled according to the passenger flow demand mean value of each time period, passenger flow distribution simulation is carried out, three indexes of passenger flow demand satisfaction rate, train average seat-getting rate and passenger flow direct rate are introduced, and the adaptability degree of the passenger flow demand and the train operation scheme of each time period is quantitatively assessed and summarized.
Furthermore, the Calinski-Harabasz index is based on the measure of the intra-class dispersion matrix and the inter-class dispersion matrix of all samples, and the class number corresponding to the maximum value of the measure is taken as the optimal clustering number, namely the best clustering number
Figure BDA0002030304960000045
Where k is the number of clusters, trB (k) is the trace of the inter-class dispersion matrix, trW (k) is the trace of the intra-class dispersion matrix, and n is the number of samples at time points.
Further, the Hartigan index is used in the case where the number of clusters is 1, and it satisfies the minimum number of clusters having ≦ 10 as the optimal number of clusters, i.e., the optimal number of clusters
Figure BDA0002030304960000051
In the formula, k is the total number of time point classification categories of the sample clustering result, trw (k) is the trace of the dispersion matrix in the categories, and n is the number of time point samples.
Further, the In-Group probability index is used to measure whether the samples closest to each sample In a certain class are In the same class, the larger the average IGP index of all clusters is, the better the cluster quality is, the class number corresponding to the maximum value is the optimal cluster number, that is, the class number corresponding to the maximum value is the optimal cluster number
Figure BDA0002030304960000052
Wherein u is the class label of a certain cluster, class (j) is the class label of a sample j, j N Sample # is the closest sample to sample j and is the number satisfying the condition.
Further, the passenger flow demand satisfaction rate is used for reflecting the destination of each passenger flow of the high-speed railway and the related road network, the passenger transport capacity and the passenger flow demand satisfaction degree provided by the train operation scheme are represented by the ratio of the passenger transport volume to the passenger flow demand total volume of the effective transport service under the condition of the established train operation scheme under the constraint of the transport capacity resource condition, particularly the train member condition, and the calculation formula is as follows:
Figure BDA0002030304960000053
in the formula (II), q' w The total passenger flow amount transmitted by the high-speed railway between the passenger flow OD and the w is the number of the passenger flow directions of the road network.
Further, the average train occupancy rate refers to an average value of all train occupancy rates within the evaluation range, the average train occupancy rate refers to a weighted ratio of the passenger flow volume carried by the train in the running section of the train to the total number of seats provided by the train, the index is used for reflecting the selection result of passengers between different passenger flow OD pairs on various types of high-speed trains, and a calculation formula of the average train occupancy rate is as follows:
Figure BDA0002030304960000054
in the formula (I), the compound is shown in the specification,
Figure BDA0002030304960000055
the passenger volume of the train h in the section (i, j), A h Number of passengers of train h, E h The number of sections in which the train h runs.
Further, the passenger flow demand structure is composed of different demand directions, each demand direction has a direct or transfer riding scheme to the destination, the direct passenger flow rate refers to a ratio of the passenger flow volume directly reaching the destination without transfer between each passenger flow demand point pair and the total passenger flow volume going upward under the set train running scheme and the passenger flow demand structure, and the calculation formula is as follows:
Figure BDA0002030304960000061
wherein w is the destination number of the road network passenger flow, | e | is the transfer times of a certain destination passenger flow,
Figure BDA0002030304960000062
for the number of passengers arriving directly at the destination without transfer between the passenger flows OD and w,
Figure BDA0002030304960000063
the number of passengers arriving at the destination for the traffic OD by | e | number of transfers between the traffic OD and w.
The invention has the beneficial effects that:
1. according to the method, the passenger flow data of each time point along the station are combined, clustering merging is carried out on time points with similar annual passenger flow by adopting a neighbor propagation algorithm, the optimal clustering number is determined according to CH, Hart and IGP indexes, and the accuracy of classification is improved;
2. the high-speed railway operation time interval division method based on the cluster analysis provided by the invention is combined with the result test of the cluster validity index, can objectively and accurately reflect the passenger flow requirements of different time intervals in the year, and overcomes the defects of low subjectivity, low efficiency and low precision of a manual division method, thereby laying a foundation for the adaptability adjustment of a train operation scheme;
3. after the optimal clustering result is determined, the clustering result is manually analyzed, the planning accuracy is ensured by checking and correcting the operation time interval division result, and meanwhile, the passenger flow demand adaptability is evaluated by the passenger flow demand satisfaction rate, the train average attendance rate and the passenger flow direct rate, so that the adaptation degree of the passenger flow demand and the train operation scheme in each time interval is further improved.
4. The invention also preprocesses the collected data, deletes the abnormal data, and adopts a Lagrange interpolation method to carry out fitting repair on the abnormal data, thereby ensuring the usability of the data and further ensuring the reliability of the planning result.
Drawings
FIG. 1 is a flow of high-speed railway operation time interval division based on neighbor propagation clustering;
fig. 2 is a schematic diagram of effective index values of different clustering numbers divided in 2014 operation periods;
fig. 3 is a schematic diagram illustrating effectiveness index values of different clustering numbers divided in 2015 operation time period;
FIG. 4 is a schematic diagram of a clustering result in an operation period of 2014;
fig. 5 is a schematic diagram of the clustering result of the 2015 operation period.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings:
as shown in fig. 1, this embodiment provides a method for dividing a passenger flow time interval of a high-speed railway based on neighbor propagation clustering, and for convenience of understanding, this embodiment adopts, as data, a passenger flow sending amount of a certain high-speed railway that has normally operated in 1 month and 1 day 2014 to 12 months and 31 days 2014, and 1 month and 1 day 2015 to 12 months and 31 days 2015, where the railway has 9 stations, and specifically includes the following steps:
s1), dividing 1-12-31 days in 2014 and 1-2015-12-31 days in 2015 into 365 time points respectively, namely, taking one time point every day, and counting the passenger flow of the descending railway at each time point, namely, the passenger flow of the descending railway at each time point
Figure BDA0002030304960000071
Respectively describe the passenger flow OD matrix in the downstream direction of the line, wherein X k The amount of traffic at time point k is,
Figure BDA0002030304960000072
the passenger flow of the station i at the time point k is obtained;
s2), judging whether the passenger flow sending volume at each time point is abnormal or not by adopting a threshold value delta, and specifically counting the time point t l The passenger flow sending vectors of the adjacent m time points are calculated, and the average value of the passenger flow sending vectors is calculated
Figure BDA0002030304960000073
If it is not
Figure BDA0002030304960000074
The time point t is considered l The passenger flow sending volume data is non-abnormal data, otherwise, the passenger flow sending volume data is abnormal data;
wherein, X l Is a point of time t l The passenger flow sending volume of (2);
if the data is abnormal, deleting the abnormal data, and fitting and repairing the passenger flow sending volume by using the passenger flow data of m adjacent time points at the time point, wherein the calculation formula is as follows:
Figure BDA0002030304960000075
Figure BDA0002030304960000076
in the formula I k (t) is a k-1 order fitting polynomial, L n (t) is a Lagrange interpolation polynomial, t is a time point to be fitted, t is j Is the jth time point, t i Is the ith time point, X i As the traffic matrix at the ith time point, l i (t) is a fit polynomial of degree i-1;
combining the data after fitting and repairing with the original normal data, and then using standard deviation to standardize and eliminate the difference of the scales between the variables, wherein the calculation formula is as follows:
Figure BDA0002030304960000077
wherein Z-score is a normalized value of standard deviation, x is a station passenger transmission amount at a certain time point,
Figure BDA0002030304960000078
the mean value of passenger sending quantity of all stations in the year, and sigma is the standard deviation of passenger sending quantity of all stations in the year;
s3), performing time interval division based on an Affinity Propagation clustering algorithm, which is a clustering on a similarity matrix S composed of sample data points, and is the same as other clustering algorithms, aiming at minimizing the distance between each data point and the representative point of the class in each classification, thereby realizing class division, specifically comprising the following steps:
s301), initializing the values of the reliability matrix r (i, k) and the availability matrix a (i, k) to be 0;
s302), calculating a sample similarity matrix S (i, k), wherein the matrix value adopts Euclidean distance as measure, i.e. S (i, k) — | | x i -x k || 2 Put the diagonal element s (k, k) to the same median value of attraction, i.e.
Figure BDA0002030304960000081
In the formula, the number of N samples is 265 in this embodiment;
s303), updating the credibility matrix r (i, k) and the availability matrix a (i, k), wherein the calculation formulas are respectively as follows:
the confidence matrix r (i, k) is updated by the calculation formula:
Figure BDA0002030304960000082
the calculation formula is updated by the degree matrix a (i, k) as follows:
Figure BDA0002030304960000083
s304), setting a damping factor to eliminate digital oscillation in iteration
Figure BDA0002030304960000084
In the formula, r new (i, k) and r old (i, k) respectively obtaining credibility matrixes obtained by the updating of the current time and the last time; a is new (i, k) and a old (i, k) respectively obtaining the availability matrixes updated this time and last time; in the embodiment, a damping factor lambda is set to be 0.9;
s305), calculating the sum of the credibility and the availability of all the samples for the passenger flow data samples at any time point, and calculating the sum according to the sum
Figure BDA0002030304960000085
Finding out a class center sample of each sample, and then outputting classification results of all time points;
s4), because a series of clustering results are output when the AP algorithm of step S3) clusters the samples, the effectiveness test is performed on various clustering results obtained by the algorithm using Calinski-harabsasz, Hartigan, and In-Group delivery indexes, and the results are shown In fig. 2 and 3, it can be seen from the figure that the optimal number of clusters of samples based on 2014 to 2015 year high-speed railway passenger flow data is 5, which is taken as the final clustering result and depicted In fig. 4 and 5;
s5), traversing and circulating all the division categories, carrying out pairwise comparison analysis on passenger flow data samples at any time point in the passenger flow data samples, splitting discontinuous time points in the same category, and forming a division result of the operation time period of the high-speed railway in 2014-2015, wherein the structure of the division result is shown in Table 1;
table 1 operation period division result
Figure BDA0002030304960000086
Figure BDA0002030304960000091
As can be seen from table 1, the high-speed railway time interval division results based on the passenger flow change rule in 2014 to 2015 are all 5 types, and 13 operation time intervals can be divided in 365 days in one year. The time spans of the operation time interval 3, the operation time interval 6, the operation time interval 7, the operation time interval 8 and the operation time interval 12 from 2014 to 2015 are the same, and the time spans of the other operation time intervals are different. The reason for this is due to the difference in spring transportation period of the past year. The spring festival in 2014 is No. 1 month No. 31, namely 31 st day; the spring festival of 2015 was No. 2/19, i.e., day 50. It can be seen that the operating period 2 is entered 7 days before the spring festival every year. The annual time sections corresponding to other operation time periods are obvious, and the general summary can be summarized as follows:
the time span of the operation period 1 is the guest flow smoothing period after the new year and before the spring festival; the time span from the operation period 2 to the operation period 4 is the peak time of the passenger flow in spring transportation; the operation period 5 time span is the passenger flow slow period between the spring transportation period and the Qingming festival; the time span of the operation period 6 is a clearness passenger flow peak period; the operation time interval 7 time span is the passenger flow slow period between the Qingming festival and the Wuyi labor festival; the operation time interval 8 is five-labor-saving passenger flow peak time; the operation time interval 9 time span is a smooth passenger flow period between five labor sections and a summer transportation period; the operation time interval 10 time span is the summer passenger flow peak time; the operation time interval 11 time span is a moderate passenger flow period between the summer transportation period and the festival of the eleven nations; the operation time period 12 time span is the passenger flow peak time of eleven national celebrations; the operation period 13 spans eleven national festations and the passenger flow before the new year is leveled and postponed.
S6), correcting the operation time interval division result, wherein the operation time intervals 3, 6 and 8 only span one or several days, and for a high-speed railway passenger transportation management department, implementing large-scale adjustment of a train operation scheme to meet passenger flow requirements in the time intervals causes excessive interference to an existing transportation plan and consumes excessive manpower and material resources. Therefore, according to the field work experience, the embodiment merges three operation periods with less days than 7 days with the adjacent operation period, and the division result of the corrected operation period of the high-speed railway is shown in table 2,
table 2 operation period division correction results
Figure BDA0002030304960000092
Figure BDA0002030304960000101
The rectified time interval division result of the high-speed railway can be summarized as follows: the operation time period 1 is a passenger flow leveling delay period after a new year and before a spring festival; the operation time period 2 is a spring passenger flow peak period; the time span of the operation period 3 is a gentle passenger flow period between the spring transportation period and the summer transportation period; the operation period 4 time span is the summer passenger flow peak period; the operation time interval 5 time span is a moderate passenger flow period between the summer transportation period and the festival of the eleven nations; the time span of the operation time period 6 is the passenger flow peak time of eleven national celebrations; the operation period is 7, the time span of eleven national festations and the guest flow before the new year is leveled and postponed. The time interval division conclusion can be used as a premise for evaluating and adjusting the train operation scheme, the adaptability of the passenger flow demand and the train operation scheme obtained according to prediction in each operation time interval is evaluated, and if the evaluation result is not ideal, the current train operation scheme needs to be adjusted;
s7) and passenger flow demand adaptability assessment, wherein in order to show that a train operation scheme made based on operation time interval division results has better adaptability to the passenger flow demand, the train operation scheme is compiled according to the passenger flow average value of each time interval on the basis of high-speed railway operation time interval division, and each adaptability assessment index of the train operation scheme and the passenger flow demand of each time interval is simulated and calculated. Meanwhile, according to the time interval division condition in actual operation of 2014 and 2015 years of the high-speed railway, the train operation scheme is compared with the passenger flow demand adaptability, and the result is shown in table 3,
table 3 comparison with actual operating conditions
Figure BDA0002030304960000102
As can be seen from table 3, the train operation scheme compiled according to the operation period division result of the neighbor propagation clustering algorithm has better adaptability to the passenger flow demand on the premise that the number of times of large-scale adjustment of the train operation scheme is not changed. The passenger flow demand satisfaction rate, the average train boarding rate and the passenger flow direct rate are respectively increased by 7.6%, 16.7% and 14.1% in 2014, and the three indexes are respectively increased by 5.7%, 18.4% and 14.4% in 2015.
In the embodiment, by combining with passenger traffic survey data every day along a station, clustering merging is performed on time points with similar annual passenger traffic by adopting a neighbor propagation algorithm, the optimal clustering number is determined according to CH, Hart and IGP indexes, and a high-speed railway annual operation time interval division method is designed on the basis, wherein the main conclusion is as follows
(1) The high-speed railway operation time interval division method based on the clustering analysis is combined with the result test of the clustering validity index, the passenger flow requirements in different time intervals in the year can be objectively and accurately reflected, and the defects of low subjectivity, low efficiency and low precision of a manual division method are overcome, so that a foundation is laid for the adaptive adjustment of a train operation scheme.
(2) Example research using statistical data of passenger transmission amount of a station along a certain high-speed railway as a sample shows that the whole year can be divided into reasonable operation time intervals on the basis of determining the optimal clustering result of the annual operation time interval and manually analyzing the clustering result.
The foregoing embodiments and description have been provided to illustrate the principles and preferred embodiments of the invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A high-speed railway passenger flow time interval division method based on neighbor propagation clustering comprises the following steps:
s1), dividing one year into T time points, setting any high-speed railway line X to comprise n important stations, and counting each time interval T in the upstream or downstream direction of the high-speed railway line X k The passenger traffic of each station in the building of a passenger traffic matrix, i.e.
Figure FDA0003793262660000011
Wherein, X k Representing a time interval t k The passenger traffic volume of the interior high-speed rail line X,
Figure FDA0003793262660000012
representing a time interval t k Passenger flow sending volume of the nth station of the inner high-speed rail line X;
s2), judging whether the passenger flow sending volume at each time point is abnormal or not by adopting a threshold value delta, and specifically counting the time point t l The passenger flow sending vectors of the adjacent m time points are calculated, and the average value of the passenger flow sending vectors is calculated
Figure FDA0003793262660000017
If it is not
Figure FDA0003793262660000013
The time point t is considered l The passenger flow sending volume data is non-abnormal data, otherwise, the passenger flow sending volume data is abnormal data;
wherein, X l At a time point t l Passenger flow sending volume of (2);
if the data is abnormal, deleting the abnormal data, and fitting and repairing the passenger flow sending quantity according to the passenger flow data of m adjacent time points at the time point by using a Lagrange interpolation method, wherein the calculation formula is as follows:
Figure FDA0003793262660000014
Figure FDA0003793262660000015
in the formula I k (t) is a k-1 order fitting polynomial, L n (t) is a Lagrange interpolation polynomial, t is a time point to be fitted, t is j Is the jth time point, t i Is the ith time point, X i As the traffic matrix at the ith time point, l i (t) is a fitting polynomial of degree i-1;
merging the fitted and repaired data with the original normal data, and then normalizing by using a standard deviation to eliminate the difference of the scales between the variables, wherein the calculation formula is as follows:
Figure FDA0003793262660000016
wherein Z-score is a normalized value of standard deviation, x is a station passenger transmission amount at a certain time point,
Figure FDA0003793262660000021
is the mean value of passenger transmission quantity of all sites in the year, and sigma is passengers of all sites in the yearA standard deviation of the transmitted amount;
s3), time interval division is carried out based on a neighbor propagation clustering algorithm, a passenger flow sending volume data set of T time points in the passenger flow sending volume data set is used as a candidate class representative, the similarity S (i, k) of the passenger flow sending volume in any 2 time intervals is judged, and the similarity S (i, k) represents a passenger flow sending volume sample X of the time interval k k Sample of passenger traffic volume X at time interval i i Of (2), i.e. sample X k Is suitable for use as sample X i When the algorithm is initialized, it is assumed that all samples have the same probability of being taken as class representations, i.e., all s (i, k) are assumed to be the same median value p of attraction, where the similarity calculation formula of any two samples is:
s(i,k)=-||x i -x k || 2
wherein x is i Sample of passenger traffic, x, representing i k The passenger flow sending volume at the time k is shown;
defining a confidence matrix r and an availability matrix a, wherein the confidence matrix r (i, k) is derived from the samples x i Pointing to sample x k Represents a sample x k Are suitable as x i Degree of representation of the class representation of (c); a (i, k) is from sample x k Pointing to sample x i Denotes x i Selection of x k The degree of suitability as a class representative; for arbitrary sample x i Calculating the sum of the credibility r (i, k) and the availability a (i, k) of the passenger flow sending quantity in other time intervals, and if the sum of the credibility r (i, k) and the availability a (i, k) is maximum, obtaining a sample x k Outputting all time point classification results for class representation;
the method specifically comprises the following steps:
s301), setting initial values of a reliability matrix r (i, k) and an availability matrix a (i, k) to be 0;
s302), calculating a similarity matrix S (i, k) of passenger flow sending volume samples at any time interval, wherein the matrix value adopts Euclidean distance as measure, namely S (i, k) ═ x i -x k || 2
Set the diagonal elements s (i, k) to the same median value of attractiveness, i.e.
Figure FDA0003793262660000022
In the formula, the number is N samples;
s303), updating the credibility matrix r (i, k) and the availability matrix a (i, k), wherein the credibility matrix r (i, k) is updated by the calculation formula:
Figure FDA0003793262660000023
the calculation formula is updated by the degree matrix a (i, k) as follows:
Figure FDA0003793262660000031
s304), setting a damping factor λ eliminates digital oscillations in the iteration, i.e.,
Figure FDA0003793262660000032
in the formula, r new (i, k) and r old (i, k) respectively obtaining credibility matrixes obtained by the updating of the current time and the last time; a is a new (i, k) and a old (i, k) respectively obtaining the availability matrixes obtained by the current time and the last time of updating; lambda epsilon (0,1) is a damping factor;
s305), calculating the sum of the reliability and the availability of any passenger flow sending volume data sample and all passenger flow sending volume samples, and obtaining the sum according to the sum
Figure FDA0003793262660000033
Finding a class center sample of each sample;
s306), updating N ← N +1 for the current iteration frequency, and judging whether the information iteration process reaches the set maximum iteration frequency, namely N is less than or equal to N max If yes, the algorithm is terminated, all time point category division results are output, and if not, the step S302 is returned to);
s4), respectively calculating Calinski-Harabasz, Hartigan and In-Group probability indexes of different time point class division results, and selecting the optimal time point classification number and the corresponding class division result;
s5), checking and correcting the division result of the operation time period, traversing and circulating all the division categories, carrying out pairwise comparison analysis on each sample, if the time points corresponding to the two samples are adjacent, combining the two samples into one operation time period, and if not, regarding the two samples as the other operation time period;
s6), evaluating the adaptability of the passenger flow demand, dividing the operation time periods, compiling a train operation scheme according to the mean value of the passenger flow demand in each time period, simulating the passenger flow distribution, introducing three indexes of the passenger flow demand satisfaction rate, the average seat-in rate of the train and the direct passenger flow rate, and quantitatively evaluating and summarizing the adaptability of the passenger flow demand and the train operation scheme in each time period.
2. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: step S4), the Calinski-Harabasz index is based on the measure of the intra-class dispersion matrix and the inter-class dispersion matrix of all samples, and the class number corresponding to the maximum value is used as the optimal clustering number, namely the optimal clustering number
Figure FDA0003793262660000041
Where k is the number of clusters, trB (k) is the trace of the inter-class dispersion matrix, trW (k) is the trace of the intra-class dispersion matrix, and n is the number of samples at time points.
3. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: in step S4), the Hartigan index is used for the case that the clustering number is 1, and the Hartigan index satisfies the minimum clustering number of Ha less than or equal to 10 as the optimal clustering number, namely
Figure FDA0003793262660000042
Where k is the number of clusters, trW (k) is the trace of the intra-class dispersion matrix, and n is the number of samples at a time point.
4. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: step S4), the In-Group contribution index is used to measure whether the samples closest to each sample In a certain class are In the same class, the larger the average In-Group contribution index of all clusters is, the better the cluster quality is, the class number corresponding to the maximum value is the optimal cluster number, that is, the optimal cluster number is
Figure FDA0003793262660000043
Wherein u is the class label of a certain cluster, class (j) is the class label of a sample j, j N Sample # is the closest sample to sample j and is the number satisfying the condition.
5. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: in step S6), the passenger flow demand satisfaction rate is used to reflect the arrival direction of each passenger flow of the high-speed railway and the related road network, the passenger transport capacity and the passenger flow demand satisfaction degree provided by the train operation scheme are expressed by the ratio of the passenger transport volume to the total passenger flow demand volume of the effective transport service under the condition of the established train operation scheme under the constraint that the transport capacity resource condition is the train passenger deciding condition, and the calculation formula is as follows:
Figure FDA0003793262660000051
in the formula (II), q' w The total passenger flow amount transmitted by the high-speed railway between the passenger flow OD and the w is the number of the passenger flow directions of the road network.
6. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: in step S6), the average train occupancy rate refers to an average of all train occupancy rates within the evaluation range, the average train occupancy rate refers to a weighted ratio of the passenger flow volume borne by the train in the running section of the train to the total number of seats provided by the train, the index is used for reflecting the result of selecting various types of high-speed trains by passengers between different passenger flow OD pairs, and a calculation formula of the average train occupancy rate is as follows:
Figure FDA0003793262660000052
in the formula (I), the compound is shown in the specification,
Figure FDA0003793262660000053
the passenger volume of the train h in the section (i, j), A h Number of passengers of train h, E h The number of sections in which the train h runs.
7. The method for dividing the passenger flow time interval of the high-speed railway based on the neighbor propagation clustering, according to claim 1, is characterized in that: step S6), the passenger flow demand structure is composed of different demand directions, and each demand direction has a direct arrival to destination or a transfer riding scheme, the passenger flow direct rate is a ratio of the passenger flow volume directly arriving at destination without transfer to the total passenger flow volume going to destination between each pair of passenger flow demand points under the established train driving scheme and passenger flow demand structure, and the calculation formula is:
Figure FDA0003793262660000054
wherein, w is the destination number of the road network passenger flow, | e | is the transfer times of a certain destination passenger flow,
Figure FDA0003793262660000055
for the number of passengers arriving directly at the destination without transfer between the passenger flows OD and w,
Figure FDA0003793262660000056
the number of passengers arriving at the destination for the traffic OD by | e | transfer between w.
CN201910307332.7A 2019-04-17 2019-04-17 High-speed railway passenger flow time interval division method based on neighbor propagation clustering Active CN110119884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910307332.7A CN110119884B (en) 2019-04-17 2019-04-17 High-speed railway passenger flow time interval division method based on neighbor propagation clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910307332.7A CN110119884B (en) 2019-04-17 2019-04-17 High-speed railway passenger flow time interval division method based on neighbor propagation clustering

Publications (2)

Publication Number Publication Date
CN110119884A CN110119884A (en) 2019-08-13
CN110119884B true CN110119884B (en) 2022-09-13

Family

ID=67521058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910307332.7A Active CN110119884B (en) 2019-04-17 2019-04-17 High-speed railway passenger flow time interval division method based on neighbor propagation clustering

Country Status (1)

Country Link
CN (1) CN110119884B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181955B (en) * 2020-09-01 2022-12-09 西南交通大学 Data standard management method for information sharing of heavy haul railway comprehensive big data platform
CN112749836B (en) * 2020-12-22 2022-07-29 蓝海(福建)信息科技有限公司 Customized passenger transport intelligent transportation capacity allocation method based on passenger flow time sequence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105857350A (en) * 2016-03-17 2016-08-17 中南大学 High-speed rail train driving method based on section profile passenger flow
WO2017063356A1 (en) * 2015-10-14 2017-04-20 深圳市天行家科技有限公司 Designated-driving order predicting method and designated-driving transport capacity scheduling method
CN107145985A (en) * 2017-05-09 2017-09-08 北京城建设计发展集团股份有限公司 A kind of urban track traffic for passenger flow Regional Linking method for early warning
JP2018503920A (en) * 2015-01-27 2018-02-08 ベイジン ディディ インフィニティ テクノロジー アンド ディベロップメント カンパニー リミティッド Method and system for providing on-demand service information
CN108665083A (en) * 2017-03-31 2018-10-16 江苏瑞丰信息技术股份有限公司 A kind of method and system for advertisement recommendation for dynamic trajectory model of being drawn a portrait based on user
CN108805344A (en) * 2018-05-29 2018-11-13 五邑大学 A kind of high-speed railway network train running scheme optimization method considering time-dependent demand

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018503920A (en) * 2015-01-27 2018-02-08 ベイジン ディディ インフィニティ テクノロジー アンド ディベロップメント カンパニー リミティッド Method and system for providing on-demand service information
WO2017063356A1 (en) * 2015-10-14 2017-04-20 深圳市天行家科技有限公司 Designated-driving order predicting method and designated-driving transport capacity scheduling method
CN105857350A (en) * 2016-03-17 2016-08-17 中南大学 High-speed rail train driving method based on section profile passenger flow
CN108665083A (en) * 2017-03-31 2018-10-16 江苏瑞丰信息技术股份有限公司 A kind of method and system for advertisement recommendation for dynamic trajectory model of being drawn a portrait based on user
CN107145985A (en) * 2017-05-09 2017-09-08 北京城建设计发展集团股份有限公司 A kind of urban track traffic for passenger flow Regional Linking method for early warning
CN108805344A (en) * 2018-05-29 2018-11-13 五邑大学 A kind of high-speed railway network train running scheme optimization method considering time-dependent demand

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
不同轴重下轮轨损伤行为研究;刘吉华等;《五邑大学学报(自然科学版)》;20171115(第04期);全文 *

Also Published As

Publication number Publication date
CN110119884A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
Si et al. A multi‐class transit assignment model for estimating transit passenger flows—a case study of Beijing subway network
Levinson et al. Forecasting and evaluating network growth
CN110119884B (en) High-speed railway passenger flow time interval division method based on neighbor propagation clustering
CN108694278A (en) A kind of city discrete network design problem method based on road load equilibrium
Zhang et al. Agent-based model of price competition, capacity choice, and product differentiation on congested networks
Mehran et al. Analytical models for comparing operational costs of regular bus and semi-flexible transit services
CN111553527A (en) Road passing time prediction method based on PSO and neural network series optimization
Zhang Can transit-oriented development reduce peak-hour congestion?
Xiong et al. An integrated and personalized traveler information and incentive scheme for energy efficient mobility systems
CN111583628A (en) Road network heavy truck traffic flow prediction method based on data quality control
Schmaranzer et al. Multi-objective simulation optimization for complex urban mass rapid transit systems
Zuna et al. Developing a model of toll road service quality using an artificial neural network approach
Tavassoli et al. Application of smart card data in validating a large-scale multi-modal transit assignment model
Zheng et al. Ticket fare optimization for China’s high-speed railway based on passenger choice behavior
Liu et al. Optimizing a desirable fare structure for a bus-subway corridor
Hau Using a Hicksian approach to cost-benefit analysis in discrete choice: An empirical analysis of a transportation corridor simulation model
Sahu et al. Spatial data analysis approach for network-wide consolidation of bus stop locations
CN116050617A (en) Expressway charging facility layout planning method considering mileage anxiety
Uchida et al. A stochastic multimodal reliable network design problem under adverse weather conditions
CN115713184A (en) Bus route operation service evaluation method
Gao et al. Regulating for-hire autonomous vehicles for an equitable multimodal transportation network
Beaudoin et al. Public transit investment and traffic congestion policy
CN115130868A (en) Mobile phone signaling-based urban land utilization and traffic integrated interactive feedback model
West et al. The Gothenburg congestion charges: CBA and equity
CN112613662A (en) Highway traffic volume analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant