CN117807450B - Urban intelligent public transportation system and method - Google Patents

Urban intelligent public transportation system and method Download PDF

Info

Publication number
CN117807450B
CN117807450B CN202410002831.6A CN202410002831A CN117807450B CN 117807450 B CN117807450 B CN 117807450B CN 202410002831 A CN202410002831 A CN 202410002831A CN 117807450 B CN117807450 B CN 117807450B
Authority
CN
China
Prior art keywords
user
matching
data
data set
travel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410002831.6A
Other languages
Chinese (zh)
Other versions
CN117807450A (en
Inventor
王雪峰
罗磊
钱晓艳
严乐宝
张伟峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Henglong Intelligent Technology Group Co ltd
Original Assignee
Zhejiang Henglong Intelligent Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Henglong Intelligent Technology Group Co ltd filed Critical Zhejiang Henglong Intelligent Technology Group Co ltd
Priority to CN202410002831.6A priority Critical patent/CN117807450B/en
Publication of CN117807450A publication Critical patent/CN117807450A/en
Application granted granted Critical
Publication of CN117807450B publication Critical patent/CN117807450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an urban intelligent public transportation system and method, in particular to the public transportation field, which are characterized in that the association between a card swiping record and an actual track is established, a matching data set is evaluated through the density among clusters and the abundance index of a sequence related mode, and the rule analysis feasibility is evaluated in advance. In the user travel pattern recognition and riding intent prediction, user behavior is predicted by matching card swiping records and tracks and using statistical analysis and machine learning. Abstracting the sites and roads into network diagrams, fusing user characteristics, and calculating reachability by using Dijkstra algorithm. And the model fitting degree is evaluated through the matching degree, whether the scene factor is updated or not is determined, the capability of adapting to urban traffic changes is improved, and real-time support is provided for public traffic planning. The integration process is beneficial to optimizing data processing, improving rule analysis accuracy and supporting urban traffic efficiency improvement and public traffic system operation optimization.

Description

Urban intelligent public transportation system and method
Technical Field
The invention relates to the field of public transportation, in particular to an urban intelligent public transportation system and method.
Background
Due to the problems of asymmetric information, low sharing rate and the like, the conventional public transportation capacity cannot be fully exerted, so that the traffic congestion index is continuously high, and the traffic infrastructure is still insufficient. In a complicated urban traffic system, the existing urban traffic system only passively adapts to the requirements from the traffic supply angle, and the problem of unbalanced supply and demand of urban traffic is difficult to be fundamentally solved. From the root of traffic problems, the dislocation of the activity space-time law and the space structure of urban residents is one of key factors for causing a plurality of urban problems. Causing increasingly severe traffic congestion and congestion problems.
In order to solve the above problems, a technical solution is now provided.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides an urban intelligent public transportation system and method, which are used for establishing the association between a card swiping record and an actual track, acquiring a matching data set, calculating the density among clusters and the abundance index of a sequence related mode, and comparing an evaluation coefficient with an analyzable degree to generate an analyzable signal. The mechanism evaluates the rule analysis feasibility in advance, pertinently selects basic data conforming to the rule, and improves the rule analysis accuracy. In the travel mode identification and riding intent speculation of the user, the travel mode is identified by matching the card swiping records with the track and utilizing a clustering method and the like. Through statistical analysis and machine learning, the riding intention of the user is presumed, and substantial support is provided for traffic planning. Abstracting the sites and roads into a network diagram, fusing user characteristics, and calculating reachability by using Dijkstra algorithm. And evaluating the fitting degree through the matching degree, determining whether to update the scene factors, improving the capability of the model for adapting to urban traffic changes, and providing real-time support for public traffic planning. The integration process is beneficial to optimizing data processing, improving rule analysis accuracy, and providing reliable support for improving urban traffic efficiency and public transportation system operation so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
step S100, sorting bus card data and track data, adopting dynamic time warping, calculating similarity scores between a card swiping record sequence and a track sequence by comparing the card swiping record sequence and the track sequence, capturing the relationship between the card swiping record and an actual track by considering dynamic trade-off of time and space, and obtaining a matching data set;
step S200, judging the travel rule of the user according to the matching data set by the matching data set;
Step S300, identifying a travel mode of the user from the card swiping record according to the judging result, analyzing the behavior of the user, and then estimating the riding intent of the user based on the historical travel mode by utilizing the machine learning model;
step S400, integrating the user trip characteristics and the network analysis process of the bus reachability model, fusing the user individual characteristics into the bus stop network, calculating real-time reachability through the dynamic scene factors, optimizing weights, re-calculating the matching degree, and adjusting the updating frequency of the dynamic scene factors based on the matching degree.
In a preferred embodiment, step S100 includes the following:
step one, acquiring bus card data and track data;
Step two, in order to ensure that two data sources have the same time scale, executing time alignment operation, setting a time threshold, selecting track data in the time threshold, and matching with each card swiping record;
step three, comparing the longitude and latitude sequence recorded by card swiping with the longitude and latitude sequence of the track, and searching an optimal alignment mode by using a dynamic time warping algorithm so as to minimize the accumulated distance;
and step four, setting a similarity threshold value, and if the similarity score is larger than the threshold value, matching the card-swiping record with the track as a pair and including the pair in a matching data set.
In a preferred embodiment, step S200 specifically includes the following:
Using the matching data set to count the basic information of each user and judge the travel rule of the user in a circulating way; and obtaining the inter-cluster density index by using a clustering algorithm, and obtaining the order related mode abundance index by using an order degree algorithm.
In a preferred embodiment, the inter-cluster density index is obtained by:
Step one, preparing a matching data set which comprises samples needing to be clustered;
Step two, dividing the data into K clusters by using K-means clustering;
step three, clustering the data by using a selected clustering algorithm to obtain clusters to which each sample belongs;
Step four, calculating the cluster inside and the distance between clusters of each sample;
Step five, obtaining the contour coefficient of each sample;
and step six, averaging the contour coefficients of all the samples to obtain the inter-cluster density index.
In a preferred embodiment, the sequence-dependent mode abundance index is obtained by:
step one, preparing a matching data set which comprises samples to be analyzed;
Step two, sorting relevant variables of each sample, namely arranging characteristic values of the samples according to the sizes;
Step three, calculating the difference of each sample before and after sorting, namely calculating sorting difference;
and step four, squaring the sequencing difference value, and calculating the abundance index of the sequence related mode according to the square of the difference value.
In a preferred embodiment, the inter-cluster density index and the order related mode abundance index are used for normalization and weighting calculation to obtain an evaluation coefficient; comparing the evaluation coefficient with the analyzable degree threshold, and generating a highly analyzable signal if the evaluation coefficient is greater than or equal to the analyzable degree threshold; if the evaluation coefficient is smaller than the analyzable degree threshold, a low analyzable signal is generated.
In a preferred embodiment, step S300 specifically includes the following:
Extracting characteristics of a user and defining a riding label of the user by using a matching data set for acquiring a highly-analyzable signal as basic data; constructing a training data set by using history matching data, wherein the training data set comprises user characteristics and corresponding riding intention labels, using a random forest as a machine learning model, and performing model training by using the constructed data set; and for the user to be predicted, extracting corresponding characteristics, and predicting the riding intention of the user by using the trained model.
In a preferred embodiment, step S400 specifically includes the following:
fusing user travel data and acquiring public transportation network data and scene factor data;
abstracting bus stops and roads or paths between the bus stops into a network diagram;
integrating the travel characteristics of the users into a network, and calculating the basic reachability among bus stops by using Dijkstra algorithm and the like; according to the dynamic scene factors updated regularly, adjusting the weights of time and space;
combining the travel characteristics of the user with the calculated reachability to form a comprehensive travel characteristic-reachability model of the user;
And carrying out matching degree calculation on the fused user characteristic-reachability model and actual bus trip data.
In a preferred embodiment, the degree of matching is compared to a matching threshold, and if the degree of matching is greater than or equal to the matching threshold, no signal is generated; and if the matching degree is smaller than the matching threshold value, generating an updated scene factor frequency signal.
An urban intelligent public transportation system comprises a similarity calculation module, a rule matching module, a behavior analysis module and a characteristic integration module;
the similarity calculation module is used for sorting bus card data and track data, adopting dynamic time regularity, calculating similarity scores between the card swiping record sequence and the track sequence by comparing the card swiping record sequence with the track sequence, capturing the relationship between the card swiping record and the actual track by considering dynamic trade-off of time and space, obtaining a matching data set, and transmitting the matching data set to the rule matching module;
The rule matching module judges the travel rule of the user according to the matching data set through the matching data set, and sends a judging data result to the behavior analysis module;
The behavior analysis module identifies a travel mode of the user from the card swiping record according to the judgment result, performs behavior analysis of the user, then utilizes the machine learning model, estimates the riding intention of the user based on the history travel mode, and sends travel characteristics to the characteristic integration module;
The characteristic integration module integrates the user trip characteristics and the network analysis process of the bus reachability model, integrates the user individual characteristics into the bus stop point network, calculates real-time reachability through the dynamic scene factors, optimizes the weight, recalculates the matching degree, and adjusts the updating frequency of the dynamic scene factors based on the matching degree.
The invention relates to a city intelligent public transportation system and a method thereof, which have the technical effects and advantages that:
1. According to the invention, after the matching data set of the user is acquired, the inter-cluster density index and the sequence related mode abundance index of the matching data set are acquired, normalized and weighted to calculate to obtain the evaluation coefficient, the evaluation coefficient and the analyzable degree threshold are compared, a high or low analyzable signal is generated according to the comparison result, the feasibility of rule analysis is further evaluated in advance, and the basic data conforming to the regularity and the trend can be selected in a targeted manner through the quantitative analysis of the evaluation coefficient, so that the accuracy and the efficiency of the rule analysis are improved. The pre-evaluation mechanism is beneficial to optimizing the data processing flow and reducing unnecessary calculation and analysis work, so that meaningful information is more effectively mined, and a more reliable basis is provided for subsequent rule analysis.
2. In the invention, in the process of user travel mode identification and riding intent estimation, the travel mode of the user is identified by matching the user card swiping record and the actual track and utilizing means such as space-time clustering and the like. And then, extracting user behavior characteristics such as travel frequency, common lines, travel time and the like through basic statistical analysis and track analysis, and further training through a machine learning model to infer riding intent of the user. The whole process is helpful for understanding the travel behaviors of the users in depth, provides beneficial user behavior analysis results, and can provide substantial support for urban traffic planning, public traffic service optimization and the like. Through identifying the travel mode, the system can understand the user demands more accurately, thereby providing personalized public transportation service, improving traffic efficiency, relieving traffic jams, and providing more convenient and comfortable travel experience for users.
3. According to the invention, bus stations and roads are abstracted into network diagrams, travel characteristics of users are fused, basic reachability is calculated by using Dijkstra algorithm, and time and space weights are comprehensively considered to form a user characteristic-reachability model. And comparing the matching degree calculation with a threshold value, evaluating the model fitting degree, and determining whether to update the scene factor. The integration process effectively improves the fitting degree of the model to actual travel, and the system can more accurately adapt to the change of urban traffic environment through dynamically adjusting the weight and the updating frequency, thereby providing beneficial real-time data support for public traffic planning, helping related departments evaluate the potential operation mechanism of the city, improving the operation efficiency of a public traffic system and providing policy support for optimizing the urban space structure.
Drawings
FIG. 1 is a schematic flow diagram of an urban intelligent public transportation system and method according to the present invention;
FIG. 2 is a schematic diagram of a system and method for urban intelligent public transportation according to the present invention;
FIG. 3 is a schematic diagram of a flow for obtaining an inter-cluster density index of an urban intelligent public transportation system and method according to the present invention;
FIG. 4 is a schematic diagram of a flow chart for obtaining the abundance index of the sequence related modes of the urban intelligent public transportation system and method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
FIG. 1 shows an urban intelligent public transportation method, which is characterized in that:
step S100, sorting bus card data and track data, adopting dynamic time warping, calculating similarity scores between a card swiping record sequence and a track sequence by comparing the card swiping record sequence and the track sequence, capturing the relationship between the card swiping record and an actual track by considering dynamic trade-off of time and space, and obtaining a matching data set;
step S200, judging the travel rule of the user according to the matching data set by the matching data set;
Step S300, identifying a travel mode of the user from the card swiping record according to the judging result, analyzing the behavior of the user, and then estimating the riding intent of the user based on the historical travel mode by utilizing the machine learning model;
step S400, integrating the user trip characteristics and the network analysis process of the bus reachability model, fusing the user individual characteristics into the bus stop network, calculating real-time reachability through the dynamic scene factors, optimizing weights, re-calculating the matching degree, and adjusting the updating frequency of the dynamic scene factors based on the matching degree.
Step S100 includes the following:
step one:
The bus Card data comprises Card number (Card ID), card swiping time (Swipe Time) and boarding point (Boarding Stop) information. First, the card number is checked for uniqueness, and any outliers or duplicate records that may exist are processed. Then, the card swipes are ordered according to time of the card swipes for subsequent time alignment.
The trajectory data contains Vehicle number (Vehicle ID), time stamp (Timestamp), latitude (Latitude), longitude (Longitude) information. Similarly, the vehicle number is checked to handle possible outliers. Ordering according to the time stamp so as to match with the bus card data.
Step two:
To ensure that both data sources have the same time scale, a time alignment operation is performed, a time threshold τ is set, and trajectory data within the time threshold is selected to match each swipe record. The time alignment formula is as follows: TIMEALIGNMENT (T) = argmin Δt|SwipeTime - (timestamp+Δt) |, where|Δt| < τ;
The purpose of this formula is to time align the card swipe time in a bus card swipe with the time stamp in the GPS track data. This alignment is to ensure that both data sources have the same time scale to facilitate subsequent trajectory similarity calculations.
Wherein:
TIMEALIGNMENT (T) is a function that represents the optimal time alignment between the swipe recording time and the track recording time stamp, and by adjusting the time offset, a time is found that brings the two closest times.
Swipe Time is the time of the bus card swiping record;
the Timestamp is the Timestamp of the GPS track record;
Δt is the time offset;
τ is a time threshold that indicates how much time range to find the optimal time alignment.
The meaning of this formula is to find a time offset Δt such that the time of the swipe record and the timestamp of the track record are closest in this time range. That is, the bus swipe record and the GPS track record that most match in time are found by adjusting Δt.
Step three:
Track similarity is calculated using a dynamic time warping algorithm.
Comparing the longitude and latitude sequence recorded by the card swiping with the longitude and latitude sequence of the track;
wherein:
x= { (X 1,y1),(x2,y2),...,(xn,yn) } represents the longitude and latitude sequence of the swipe record;
Y= { (u 1,v1),(u2,v2),...,(un,vn) } represents the longitude and latitude sequence of the trajectory.
For both sequences, a dynamic time warping algorithm is used to find the optimal alignment so that the cumulative distance between them is minimized.
The calculation formula is as follows:
wherein DTW (X, Y) represents the similarity score.
The meaning of the formula is that for each pair of coordinate points (x k,yk), the Euclidean distance of the longitude and latitude sequence of the card swipe record and the longitude and latitude sequence of the track is calculated, and then these distances are cumulatively summed.
For better understanding, it can be metaphorically stretched or compressed time axes between two sequences to find the best match. The process of stretching or compressing is the core idea of dynamic time warping, so that the sequences are aligned better in time.
Step four:
Setting a similarity threshold, if the similarity score is larger than the threshold, indicating that the longitude and latitude sequence recorded by the card swiping and the longitude and latitude sequence of the track have enough similarity, wherein the calculation formula is as follows: MATCHPAIRS = { (X i,Yj)|DTW(Xi,Yj) > α };
where MATCHPAIRS is a matching dataset, where each element (X i,Yj) represents a pair of matches between the swipe record and the actual trajectory. The condition that the collection is obtained through the screening of the matching degree improving strategy is that the similarity score of the dynamic time regularity of the card swiping records and the tracks is larger than a similarity threshold value, and alpha represents the similarity threshold value.
Specifically, for each swipe record, the comparison is made with all tracks by a dynamic time warping algorithm, and if one track is found such that the similarity score is greater than the similarity threshold, the swipe record is paired with the track and included in the matching dataset.
For example, if there are three swipe records X 1,X2,X3, and these three swipe records match the trajectory Y 1,Y2,Y3, respectively, then: MATCHPAIRS = { (X 1,Y1),(X2,Y2),(X3,Y3) }.
Through the steps, the association between the card swiping record and the actual track is established. The time alignment ensures that both data sources have the same time scale, while the trajectory similarity algorithm and matching enhancement strategy ensures that the selected matching pairs are highly similar, which provides a reliable data basis for subsequent analysis.
The step S200 specifically includes the following:
obtaining a matching data set comprising a card swiping record and a matching track by utilizing the steps;
counting basic information of each user, wherein the basic information comprises basic information such as card swiping frequency, common boarding and alighting stations, travel time periods and the like;
And analyzing the stay points, the running speed, the path and the like in the matching track, and identifying the fuzzy travel mode of the user.
And judging the travel rule of the user according to the matching data set.
Obtaining inter-cluster density indexes by using a clustering algorithm, and obtaining order related mode abundance indexes by using an order degree algorithm;
As shown in fig. 3, the process of obtaining the inter-cluster density index is:
Step one, preparing a matching data set which comprises samples needing to be clustered;
Step two, dividing the data into K clusters by using K-means clustering;
step three, clustering the data by using a selected clustering algorithm to obtain clusters to which each sample belongs;
The calculation formula is as follows:
Wherein J (C) represents the sum of squares of distances from all sample points in all clusters to the center of the cluster; k represents the number of clusters, i.e. the number of clusters set in advance, c k represents the kth cluster, wherein K ranges from 1 to K; d l denotes the first sample point in the matching dataset, u k denotes the center of the kth cluster, i.e., the average;
Step four, calculating the cluster inside and the distance between clusters of each sample;
For each sample, the average distance from other samples in the same cluster is calculated by the following calculation formula:
Where a (l) represents the average similarity of sample l to other samples in its place; c l represents the number of samples in cluster C l; d (l, v) represents the distance between sample l and sample v; is the sum of the distances of sample l to other samples in the same cluster;
the average similarity of sample l to other samples within its locus is used to evaluate the degree of aggregation of sample l within its cluster. In the contour coefficients, the smaller the value of a (l) is, the smaller the average examples of samples l to samples in the same cluster is, and the higher the aggregation degree is.
For each sample, calculating the average distance between the sample and all samples in other clusters, finding the nearest cluster, and obtaining the average distance between clusters of the sample, wherein the calculation formula is as follows:
wherein b (l) is the inter-cluster average distance of the sample l, and represents the average distance from the sample l to all samples in other clusters; Representing the cluster closest to the sample l among all clusters except the cluster where the sample l is taken; representing the calculation of the average of the distances of the sample l to all the samples of the cluster, that is to say, for the most recent cluster found, the average distance of the sample l to all the samples in this cluster;
the calculation of the average distance between clusters of the samples l is to find the minimum value of the average distance between the samples and all the samples in the nearest cluster in the clusters except the clusters where the samples are located, and this step is to evaluate the distribution relation of the samples inside and outside the clusters, and the smaller the distribution relation is, the closer the samples are to other clusters.
Step five, obtaining the contour coefficient of each sample;
For each sample/, its profile coefficient is calculated:
and step six, averaging the contour coefficients of all the samples to obtain the inter-cluster density index.
The calculation formula is as follows:
The larger the inter-cluster density index is, the larger the average distance between sample points among different clusters is, namely the difference among clusters is obvious, and the similarity among the sample points is low. Accordingly, it is difficult to analyze regularity of the customer travel pattern. The reason is that the differences among sample points are large, and obvious commonalities or laws are difficult to find, so that the grasping of the whole mode becomes complex; on the other hand, the smaller the inter-cluster density index, the smaller the average distance of sample points between different clusters, namely the smaller the difference between clusters, and the higher the similarity between sample points. In this case, the overall pattern is easier to find, because the similarity between sample points is higher and commonality is more pronounced. In general, a larger inter-cluster density index results in an increased heterogeneity of the outgoing line patterns, while a smaller one may result in a more focused and distinct pattern.
As shown in fig. 4, the sequence-related mode abundance index is obtained by:
step one, preparing a matching data set which comprises samples to be analyzed;
Step two, sorting relevant variables of each sample, namely arranging characteristic values of the samples according to the sizes;
Step three, calculating the difference of each sample before and after sorting, namely calculating the sorting difference: d g=rank(Xg)-rank(Yg);
where X g and Y g represent the eigenvalues of sample g on two related variables, respectively, and rank (·) represents the rank.
Step four, squaring the sequencing difference value, and calculating the abundance index of the sequence related mode according to the square of the difference value:
Where M represents the number of samples.
The order related pattern abundance index is used for reflecting the ordering relation strength between two related variables in the matching dataset. The index measures the nonlinear relation between two variables, and the larger the index is, the stronger the ordering trend between the two variables is, so that the index has stronger correlation, and in addition, the ordering relation between the related variables is more stable, and the rule-circulated trend is easier to form; conversely, a smaller number indicates a weaker ordering relationship, a smaller correlation between the two variables, and a smaller index may suggest a relatively more random ordering between the related variables, with weaker regularity. Therefore, the magnitude of the index reflects the intensity of the ordering rule among the variables in the data set, has important significance for knowing the relation modes among the variables, and in addition, has positive correlation between the abundance index of the order correlation mode and the rule.
The evaluation coefficient is obtained by normalization and weighting calculation by using the inter-cluster density index and the order related mode abundance index, and can be obtained by the following calculation formula:
Wherein E is an evaluation coefficient, silhouetteScore and SPEARMANRANK respectively represent an inter-cluster density index and a sequential correlation mode abundance index, and w1 and w2 respectively represent preset proportional coefficients of the inter-cluster density index and the sequential correlation mode abundance index, and are both larger than 0.
The evaluation coefficient is obtained by normalizing and weighting and summing the inter-cluster density index and the sequence related mode abundance index and is used for evaluating the regularity of sample points in the matched data set. This evaluation aims to comprehensively consider the differences of samples among different clusters and the consistency of the samples in the sorting mode. Specifically, the larger the value of the coefficient, the larger the difference among clusters of samples in the matched data set, the stronger the consistency of the sorting mode, and the more obvious the regularity.
For the travel mode analysis difficulty of the evaluation clients, the size of the evaluation coefficient directly reflects the regularity of the whole travel mode. When the evaluation coefficient is larger, the difference of sample points in the data set among clusters is larger, the consistency of the ordering modes is stronger, the travel modes of the clients are more regular, the analysis is relatively easier, and the analysis significance is better. On the contrary, when the evaluation coefficient is smaller, the regularity of the travel mode may be weaker, the difference of the sample points among clusters is smaller, the consistency of the sorting modes is weaker, and the difficulty of the travel mode analysis of the clients is increased.
Therefore, the evaluation coefficient is used as a comprehensive index, which is helpful for comprehensively evaluating the regularity of the samples in the matched data set, so that the travel mode of the customer can be better understood and analyzed.
Comparing the evaluation coefficient with the threshold of the analyzable degree, if the evaluation coefficient is larger than or equal to the threshold of the analyzable degree, the samples in the whole data set are larger in cluster-to-cluster variability, stronger in consistency of the ordering modes and more obvious in regularity, and further customer travel mode analysis can be performed with higher confidence, so that a highly analyzable signal is generated.
If the evaluation coefficient is smaller than the threshold value of the analyzable degree, the difference of the samples in the whole data set among clusters is smaller, the consistency of the ordering modes is weaker, the regularity is weaker, the travel mode is more complex or less obvious, the analysis difficulty is greater, and a low-degree analyzable signal is generated.
According to the invention, after the matching data set of the user is acquired, the inter-cluster density index and the sequence related mode abundance index of the matching data set are acquired, normalized and weighted to calculate to obtain the evaluation coefficient, the evaluation coefficient and the analyzable degree threshold are compared, a high or low analyzable signal is generated according to the comparison result, the feasibility of rule analysis is further evaluated in advance, and the basic data conforming to the regularity and the trend can be selected in a targeted manner through the quantitative analysis of the evaluation coefficient, so that the accuracy and the efficiency of the rule analysis are improved. The pre-evaluation mechanism is beneficial to optimizing the data processing flow and reducing unnecessary calculation and analysis work, so that meaningful information is more effectively mined, and a more reliable basis is provided for subsequent rule analysis.
The step S300 specifically includes the following:
the features of the user and the ride tags defining the user are extracted using the matching dataset from which the highly analyzable signals were acquired as base data.
These features include, but are not limited to: travel frequency, common lines, travel time periods, get-on/off preference, stay time analysis and the like; a user's riding intent tag, such as the next possible route and period of time, is defined.
And constructing a training data set by using the history matching data, wherein the training data set comprises user characteristics and corresponding riding intention labels, using a random forest as a machine learning model, and performing model training by using the constructed data set.
Training data is o= { (Q 1,P1),(Q2,P2),...,(Qh,Ph) }, where Q f is a feature and P f is a tag. For classification problems, the goal of random forests is to find a suitable voting strategy.
SJSLpred=MajorityVote(r1(Qnew),r2(Qnew),…,rh(Qnew));
Where r f is the single decision tree and Q new is the sample to be predicted.
And for the user to be predicted, extracting corresponding characteristics, and predicting the riding intention of the user by using the trained model.
In the invention, in the process of user travel mode identification and riding intent estimation, the travel mode of the user is identified by matching the user card swiping record and the actual track and utilizing means such as space-time clustering and the like. And then, extracting user behavior characteristics such as travel frequency, common lines, travel time and the like through basic statistical analysis and track analysis, and further training through a machine learning model to infer riding intent of the user. The whole process is helpful for understanding the travel behaviors of the users in depth, provides beneficial user behavior analysis results, and can provide substantial support for urban traffic planning, public traffic service optimization and the like. Through identifying the travel mode, the system can understand the user demands more accurately, thereby providing personalized public transportation service, improving traffic efficiency, relieving traffic jams, and providing more convenient and comfortable travel experience for users.
Step S400 specifically includes the following:
And fusing travel history data of the user, including card swiping records, boarding and alighting stations, time slots and the like.
Acquiring relevant data such as bus stops, routes, vehicle running tracks and the like
Dynamic scene factors affecting bus reachability, such as traffic flow, weather, special events, etc., are collected.
Abstracting bus stops and the roads or paths between them into a network map.
And integrating the travel characteristics of the user into the network, for example, taking the boarding and disembarking stops of the user as nodes in the network.
Basic reachability between bus stops is calculated using Dijkstra algorithm or the like: base reachability (b, d) =1/path length (b, d);
adjusting the weights of time and space according to the periodically updated dynamic scene factors: total weight (i, j) =base weight x time weight x spatial weight;
The basic weight is determined according to the distance between bus stops or the network connection relation. Shorter distances or more direct connections may be given higher basis weights.
The time weights are determined from historical data or real-time data. For example, peak hours when traffic flow is greater, lower time weights may be assigned to reflect congestion conditions.
The spatial weights are typically related to environmental characteristics surrounding the site, such as surrounding business, residential, etc. This may be obtained by city planning data or Geographic Information System (GIS) data.
Combining the travel characteristics of the user with the calculated reachability to form a comprehensive travel characteristic-reachability model of the user: user feature-reachability (b, d) =base reachability (b, d) ×user feature weight×total weight (b, d);
And (3) carrying out matching degree calculation on the fused user characteristic-reachability model and actual bus trip data: match (b, d) =actual trip amount (b, d)/user feature-reachability (b, d).
And comparing the matching degree with a matching threshold, wherein if the matching degree is larger than or equal to the matching threshold, the matching degree indicates that the prediction of the model has higher consistency with the actual trip data, and the fitting effect of the model reaches the expected level. The current model state is good, the scene factors do not need to be updated frequently, and any signal is not generated;
If the matching degree is smaller than the matching threshold, the model has a larger difference between the prediction and the actual trip data, and the fitting effect of the model is poorer. In this case, it is necessary to consider increasing the update frequency of the scene factor to more sensitively capture the change, and generate an update scene factor frequency signal.
According to the invention, bus stations and roads are abstracted into network diagrams, travel characteristics of users are fused, basic reachability is calculated by using Dijkstra algorithm, and time and space weights are comprehensively considered to form a user characteristic-reachability model. And comparing the matching degree calculation with a threshold value, evaluating the model fitting degree, and determining whether to update the scene factor. The integration process effectively improves the fitting degree of the model to actual travel, and the system can more accurately adapt to the change of urban traffic environment through dynamically adjusting the weight and the updating frequency, thereby providing beneficial real-time data support for public traffic planning, helping related departments evaluate the potential operation mechanism of the city, improving the operation efficiency of a public traffic system and providing policy support for optimizing the urban space structure.
FIG. 2 is a schematic diagram showing an urban intelligent public transportation system according to the present invention, including a similarity calculation module, a rule matching module, a behavior analysis module, and a feature integration module;
the similarity calculation module is used for sorting bus card data and track data, adopting dynamic time regularity, calculating similarity scores between the card swiping record sequence and the track sequence by comparing the card swiping record sequence with the track sequence, capturing the relationship between the card swiping record and the actual track by considering dynamic trade-off of time and space, obtaining a matching data set, and transmitting the matching data set to the rule matching module;
The rule matching module judges the travel rule of the user according to the matching data set through the matching data set, and sends a judging data result to the behavior analysis module;
The behavior analysis module identifies a travel mode of the user from the card swiping record according to the judgment result, performs behavior analysis of the user, then utilizes the machine learning model, estimates the riding intention of the user based on the history travel mode, and sends travel characteristics to the characteristic integration module;
The characteristic integration module integrates the user trip characteristics and the network analysis process of the bus reachability model, integrates the user individual characteristics into the bus stop point network, calculates real-time reachability through the dynamic scene factors, optimizes the weight, recalculates the matching degree, and adjusts the updating frequency of the dynamic scene factors based on the matching degree.
The above formulas are all formulas with dimensionality removed and numerical calculation, the formulas are formulas with the latest real situation obtained by software simulation through collecting a large amount of data, and preset parameters and threshold selection in the formulas are set by those skilled in the art according to the actual situation.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (3)

1. An intelligent public transportation method for cities is characterized in that:
step S100, sorting bus card data and track data, adopting dynamic time warping, calculating similarity scores between a card swiping record sequence and a track sequence by comparing the card swiping record sequence and the track sequence, capturing the relationship between the card swiping record and an actual track by considering dynamic trade-off of time and space, and obtaining a matching data set;
step S200, judging the travel rule of the user according to the matching data set by the matching data set;
Step S300, identifying a travel mode of the user from the card swiping record according to the judging result, analyzing the behavior of the user, and then estimating the riding intent of the user based on the historical travel mode by utilizing the machine learning model;
Step S400, integrating individual user characteristics into a bus stop network in a network analysis process of integrating the user travel characteristics and the bus reachability model, calculating real-time reachability through dynamic scene factors, optimizing weights, re-calculating matching degree, and adjusting the update frequency of the dynamic scene factors based on the matching degree;
the step S200 specifically includes the following:
using the matching data set to count the basic information of each user and judge the travel rule of the user in a circulating way; obtaining inter-cluster density indexes by using a clustering algorithm, and obtaining order related mode abundance indexes by using an order degree algorithm;
The acquisition process of the inter-cluster density index comprises the following steps:
Step one, preparing a matching data set which comprises samples needing to be clustered;
Step two, dividing the data into K clusters by using K-means clustering;
step three, clustering the data by using a selected clustering algorithm to obtain clusters to which each sample belongs;
Step four, calculating the cluster inside and the distance between clusters of each sample;
Step five, obtaining the contour coefficient of each sample;
Step six, averaging the contour coefficients of all samples to obtain inter-cluster density indexes;
the sequence related mode abundance index is obtained by the following steps:
step one, preparing a matching data set which comprises samples to be analyzed;
Step two, sorting relevant variables of each sample, namely arranging characteristic values of the samples according to the sizes;
Step three, calculating the difference of each sample before and after sorting, namely calculating sorting difference;
Step four, squaring the sequencing difference value, and calculating the abundance index of the sequence related mode according to the square of the difference value;
Normalizing and weighting the cluster density index and the sequence related mode abundance index, and calculating to obtain an evaluation coefficient; comparing the evaluation coefficient with the analyzable degree threshold, and generating a highly analyzable signal if the evaluation coefficient is greater than or equal to the analyzable degree threshold; if the evaluation coefficient is smaller than the analyzable degree threshold, generating a low-degree analyzable signal;
The step S300 specifically includes the following:
Extracting characteristics of a user and defining a riding label of the user by using a matching data set for acquiring a highly-analyzable signal as basic data; constructing a training data set by using history matching data, wherein the training data set comprises user characteristics and corresponding riding intention labels, using a random forest as a machine learning model, and performing model training by using the constructed data set; for a user to be predicted, extracting corresponding characteristics, and predicting the riding intent of the user by using a trained model;
step S400 specifically includes the following:
fusing user travel data and acquiring public transportation network data and scene factor data;
abstracting bus stops and roads or paths between the bus stops into a network diagram;
integrating the travel characteristics of the users into a network, and calculating the basic reachability among bus stops by using Dijkstra algorithm and the like; according to the dynamic scene factors updated regularly, adjusting the weights of time and space;
combining the travel characteristics of the user with the calculated reachability to form a comprehensive travel characteristic-reachability model of the user;
Matching degree calculation is carried out on the fused user characteristic-reachability model and actual bus trip data;
Comparing the matching degree with a matching threshold value, and if the matching degree is greater than or equal to the matching threshold value, not generating any signal; and if the matching degree is smaller than the matching threshold value, generating an updated scene factor frequency signal.
2. The urban intelligent public transportation method according to claim 1, characterized in that:
Step S100 includes the following:
step one, acquiring bus card data and track data;
Step two, in order to ensure that two data sources have the same time scale, executing time alignment operation, setting a time threshold, selecting track data in the time threshold, and matching with each card swiping record;
step three, comparing the longitude and latitude sequence recorded by card swiping with the longitude and latitude sequence of the track, and searching an optimal alignment mode by using a dynamic time warping algorithm so as to minimize the accumulated distance;
and step four, setting a similarity threshold value, and if the similarity score is larger than the threshold value, matching the card-swiping record with the track as a pair and including the pair in a matching data set.
3. An urban intelligent public transportation system for realizing the urban intelligent public transportation method according to any one of the claims 1-2, which is characterized by comprising a similarity calculation module, a rule matching module, a behavior analysis module and a characteristic integration module;
the similarity calculation module is used for sorting bus card data and track data, adopting dynamic time regularity, calculating similarity scores between the card swiping record sequence and the track sequence by comparing the card swiping record sequence with the track sequence, capturing the relationship between the card swiping record and the actual track by considering dynamic trade-off of time and space, obtaining a matching data set, and transmitting the matching data set to the rule matching module;
The rule matching module judges the travel rule of the user according to the matching data set through the matching data set, and sends a judging data result to the behavior analysis module;
The behavior analysis module identifies a travel mode of the user from the card swiping record according to the judgment result, performs behavior analysis of the user, then utilizes the machine learning model, estimates the riding intention of the user based on the history travel mode, and sends travel characteristics to the characteristic integration module;
The characteristic integration module integrates the user trip characteristics and the network analysis process of the bus reachability model, integrates the user individual characteristics into the bus stop point network, calculates real-time reachability through the dynamic scene factors, optimizes the weight, recalculates the matching degree, and adjusts the updating frequency of the dynamic scene factors based on the matching degree.
CN202410002831.6A 2024-01-02 2024-01-02 Urban intelligent public transportation system and method Active CN117807450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410002831.6A CN117807450B (en) 2024-01-02 2024-01-02 Urban intelligent public transportation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410002831.6A CN117807450B (en) 2024-01-02 2024-01-02 Urban intelligent public transportation system and method

Publications (2)

Publication Number Publication Date
CN117807450A CN117807450A (en) 2024-04-02
CN117807450B true CN117807450B (en) 2024-06-11

Family

ID=90419979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410002831.6A Active CN117807450B (en) 2024-01-02 2024-01-02 Urban intelligent public transportation system and method

Country Status (1)

Country Link
CN (1) CN117807450B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101638368B1 (en) * 2015-01-02 2016-07-11 경희대학교 산학협력단 Prediction System And Method of Urban Traffic Flow Using Multifactor Pattern Recognition Model
CN111932084A (en) * 2020-07-15 2020-11-13 江苏大学 System for assessing accessibility of urban public transport
KR20200136235A (en) * 2019-05-27 2020-12-07 주식회사 위드라이브 Method and system for identifying movement means based on a user movement path
CN112543427A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying urban traffic corridor based on signaling track and big data
CN116110215A (en) * 2022-10-08 2023-05-12 贵州交通职业技术学院 Urban traffic analysis method and system based on urban automobile bearing capacity
CN116386311A (en) * 2022-10-17 2023-07-04 扬州大学 Traffic accessibility assessment method considering extreme weather conditions
CN116502781A (en) * 2023-05-05 2023-07-28 东北师范大学 Bus route planning and influence factor visual analysis method based on GPS data
CN116882609A (en) * 2023-08-08 2023-10-13 大连理工大学 Customized bus route multi-objective optimization method based on improved track clustering algorithm
CN117272084A (en) * 2023-04-28 2023-12-22 安徽大学 Automatic clustering method for bus reachability time sequence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220270192A1 (en) * 2020-01-29 2022-08-25 Urban Dashboard Ltd Computerized-system and computerized-method to calculate an economic feasibility analysis for an urban planning model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101638368B1 (en) * 2015-01-02 2016-07-11 경희대학교 산학협력단 Prediction System And Method of Urban Traffic Flow Using Multifactor Pattern Recognition Model
KR20200136235A (en) * 2019-05-27 2020-12-07 주식회사 위드라이브 Method and system for identifying movement means based on a user movement path
CN111932084A (en) * 2020-07-15 2020-11-13 江苏大学 System for assessing accessibility of urban public transport
CN112543427A (en) * 2020-12-01 2021-03-23 江苏欣网视讯软件技术有限公司 Method and system for analyzing and identifying urban traffic corridor based on signaling track and big data
CN116110215A (en) * 2022-10-08 2023-05-12 贵州交通职业技术学院 Urban traffic analysis method and system based on urban automobile bearing capacity
CN116386311A (en) * 2022-10-17 2023-07-04 扬州大学 Traffic accessibility assessment method considering extreme weather conditions
CN117272084A (en) * 2023-04-28 2023-12-22 安徽大学 Automatic clustering method for bus reachability time sequence
CN116502781A (en) * 2023-05-05 2023-07-28 东北师范大学 Bus route planning and influence factor visual analysis method based on GPS data
CN116882609A (en) * 2023-08-08 2023-10-13 大连理工大学 Customized bus route multi-objective optimization method based on improved track clustering algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于IC卡数据的武汉市轨道交通客流时空特征;王冠;陈华;李建忠;孙贻璐;;城市交通;20180725(04);全文 *
武汉市公共交通信息系统建设与应用;王冠;陈华;李建忠;孙贻璐;;城市交通;20160525(03);全文 *
面向可达性的全过程公共交通网络规划方法;徐惠农;黄伟;陈志建;刘永平;秦杰;高强飞;;城市交通;20191231(06);全文 *

Also Published As

Publication number Publication date
CN117807450A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN108629978B (en) Traffic track prediction method based on high-dimensional road network and recurrent neural network
CN109089314B (en) Indoor positioning method of wifi sequence assisted GPS based on recommendation algorithm
Yao et al. Data-driven choice set generation and estimation of route choice models
CN115790636B (en) Unmanned retail vehicle cruise path planning method and device based on big data
CN110598917B (en) Destination prediction method, system and storage medium based on path track
CN111144446B (en) Driver identity recognition method and system based on space-time grid
CN112381298B (en) Intelligent navigation guide self-service system for airport passengers
CN108596664B (en) Method, system and device for determining unilateral transaction fee of electronic ticket
CN116017407A (en) Method for reliably identifying resident trip mode driven by mobile phone signaling data
CN116437291A (en) Cultural circle planning method and system based on mobile phone signaling
CN117807450B (en) Urban intelligent public transportation system and method
CN116665482B (en) Parking space recommending method and device based on intelligent parking
Wang et al. Taxi-cruising recommendation via real-time information and historical trajectory data
CN111343664B (en) User positioning method, device, equipment and medium
CN112699955A (en) User classification method, device, equipment and storage medium
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
Zhu et al. Validating rail transit assignment models with cluster analysis and automatic fare collection data
CN115964570A (en) Cloud service recommendation method and device based on QoS multi-period change characteristic prediction
CN116401586A (en) Intelligent sensing and accurate classifying method for full scene service
CN115935076A (en) Travel service information pushing method and system based on artificial intelligence
CN114611622A (en) Method for identifying cross-city commuting crowd by utilizing mobile phone data
CN114007186A (en) Positioning method and related product
CN113516229A (en) Credible user optimization selection method facing crowd sensing system
Xu et al. Metro train operation plan analysis based on station travel time reliability
CN117593034B (en) User classification method based on computer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant