CN117875523B

CN117875523B - Bus stop optimizing method based on multi-source data

Info

Publication number: CN117875523B
Application number: CN202410281704.4A
Authority: CN
Inventors: 颜建强; 李植豪; 高原; 曲卜婷
Original assignee: NORTHWEST UNIVERSITY
Current assignee: NORTHWEST UNIVERSITY
Priority date: 2024-03-13
Filing date: 2024-03-13
Publication date: 2024-06-04
Anticipated expiration: 2044-03-13
Also published as: CN117875523A

Abstract

The application discloses a bus stop optimizing method based on multi-source data, which comprises the following steps: collecting bus track data, bus card swiping data, city interest point data and taxi order data; obtaining bus passenger flow data and taxi passenger travel data; based on NSGA-II algorithm, constructing a bus abnormal station recognition model according to bus passenger flow data and city interest point data, and solving to obtain an abnormal station result set; performing grid division and cluster analysis on taxi passenger travel data to obtain bus stop data to be added; and carrying out bus stop optimization according to the abnormal stop result set and the bus stop data to be added. According to the method, the abnormal stations and the stations to be added are identified by carrying out combined analysis on the bus track data, the bus card swiping data, the city interest point data and the taxi order data, so that the accuracy and the data utilization rate of the optimization of the bus stations are improved.

Description

Bus stop optimizing method based on multi-source data

Technical Field

The invention relates to the technical field of bus network optimization, in particular to a bus stop optimization method based on multi-source data.

Background

With the development of urban process, public transportation system sites are unreasonable to be arranged, lines are unreasonable, and the phenomenon of uneven network coverage is increasingly aggravated. So that optimization problems of the public transport network need to be considered.

In the prior art, patent publication No. CN113378337A, an urban public transportation network optimization method based on passenger flow analysis, aims at the problem that the operation efficiency of a public transportation system in the traditional public transportation network optimization model is not too high, and combines the characteristic of dynamic change of passenger flow characteristics along with time to provide the urban public transportation network optimization model based on passenger flow and solve the model by utilizing a traditional genetic algorithm. The patent publication No. CN109118023A, "a public transportation network optimization method" uses a simulated annealing algorithm as a framework, obtains an initial network under the framework with the aim of minimizing the total operation cost of an operation company all day, breaks up the initial network to form network units, uses the initial network as an input network, and embeds a genetic algorithm into a route and a station which are optimized and solved.

In the optimization process of the bus stop, the abnormal bus stop is identified, the stop optimization range can be reduced, the accuracy is improved, and therefore the waste of optimized resources is avoided, however, the prior art is used for adjusting the whole wire network in a large range without combining with the specific abnormal stop condition, and the accuracy of the bus stop optimization is poor. In addition, the taxi passenger travel data can reflect traffic demands of people, so that references are provided for bus stop optimization, analysis of the taxi passenger travel data is not considered in the prior art, and the data utilization rate is poor.

Disclosure of Invention

The invention provides a bus stop optimizing method based on multi-source data, which is used for solving the problem that in the prior art, the accuracy and the data utilization rate of bus stop optimization are not relatively reliable.

In one aspect, the invention provides a bus stop optimizing method based on multi-source data, which comprises the following steps:

step one, collecting bus track data, bus card swiping data, city interest point data and taxi order data.

Step two, bus passenger flow data are obtained according to the bus track data and the bus card swiping data, and taxi passenger trip data are obtained according to the taxi order data; the bus passenger flow data comprise bus stop data and corresponding stop passenger flow data, and the taxi passenger trip data comprise passenger starting place data and corresponding passenger destination data.

And thirdly, constructing a bus abnormal station identification model based on an NSGA-II algorithm according to the bus passenger flow data and the city interest point data, and solving to obtain an abnormal station result set.

And fourthly, performing grid division on the taxi passenger travel data to obtain grid data, and performing cluster analysis on the grid data to obtain bus stop data to be added.

And fifthly, optimizing the bus stops according to the abnormal stop result set and the bus stop data to be added.

In one possible implementation manner, in the second step, the obtaining bus passenger flow data according to the bus track data and the bus card swiping data includes: preprocessing the bus track data and the bus card swiping data to obtain preprocessed track data and preprocessed card swiping data; and matching the preprocessing track data with the preprocessing card swiping data to obtain the bus passenger flow data.

In the second step, the obtaining the taxi passenger trip data according to the taxi order data includes: preprocessing the taxi order data to obtain preprocessed order data, namely taxi passenger travel data.

In a possible implementation manner, in the second step, the matching the pre-processing track data with the pre-processing card swiping data to obtain the bus passenger flow data includes:

Step S21, inputting the preprocessing card swiping data.

Step S22, inputting the preprocessing track data.

Step S23, line matching is carried out on the preprocessing track data and the preprocessing card swiping data.

And step S24, performing vehicle matching and time matching on the pre-processing track data and the pre-processing card swiping data after the line matching to obtain bus passenger travel data.

And S25, outputting the bus passenger travel data, wherein the set of the bus passenger travel data is the bus passenger flow data.

In one possible implementation, step three includes:

step S31, inputting the bus stop data and the corresponding stop passenger flow data.

Step S32, initializing the total number of abnormal bus stops, setting the maximum population scale and the maximum iteration number, and initializing the iteration number.

Step S33, generating an initial population randomly, i.e. a random bus stop set.

Step S34, calculating an objective function value of each bus station, and dividing the individuals into different non-dominant fronts.

Step S35, sorting the individuals according to the non-dominant grade and the crowding degree.

Step S36, selecting individuals according to the non-dominant grade and the crowding degree, and constructing a new population.

Step S37, performing cross operation on the selected individuals to generate new individuals.

Step S38, performing genetic variation operation on the current population, and introducing new variation.

Step S39, combining the initial population with the new individuals to generate a new generation population, adding one to the iteration number, returning to step S34 for circulation until the iteration number is equal to the maximum iteration number, and outputting non-dominant individuals of the final population, namely the abnormal site result set.

In one possible implementation, in step S33, the randomly generated initial population employs a roulette probability selection method.

In a possible implementation manner, in step S34, the objective function includes: the system comprises a passenger flow difference objective function in a range, a same-line passenger flow difference objective function, other line passenger flow difference objective functions and an urban interest point objective function.

In a possible implementation manner, in step four, the performing grid division on the taxi passenger trip data to obtain grid data includes:

step S411, inputting the taxi passenger trip data.

Step S412, initializing a grid list.

In step S413, each taxi passenger travel data is added to the grid list.

Step S414, calculating longitude and latitude of each grid and the number of taxi passenger travel data in each grid, and removing the grids without data.

Step S415, outputting a grid set, i.e. the grid data.

In a possible implementation manner, in step four, performing cluster analysis on the grid data to obtain data about a bus stop to be added includes:

Step S421, inputting the grid data and initializing a cluster center set.

Step S422, a maximum distance threshold is set.

Step S423, selecting the clustering number.

Step S424, randomly selecting a plurality of grids as an initial cluster center set, distributing travel data of each taxi passenger, updating the cluster center, and carrying out iteration for a plurality of times until convergence to obtain a final cluster center set.

And step S425, outputting all cluster centers meeting a preset quantity threshold in the final cluster center set, namely the bus stop data to be added.

In one possible implementation, the fifth step includes:

And dividing the region by taking each abnormal site in the abnormal site result set as a circle center and taking a preset distance as a radius to obtain an abnormal site region result set.

And taking the station which falls into the abnormal station area result set in the bus station data to be added as the final adding station, and adding the actual bus station.

The bus stop optimizing method based on the multi-source data has the following advantages:

By means of combined analysis of bus track data, bus card swiping data, city interest point data and taxi order data, abnormal stations and stations to be added are identified, and accuracy and data utilization rate of bus station optimization are improved.

The result set of the abnormal site is obtained through NSGA-II algorithm, and the accuracy of the abnormal site identification is improved.

The proposed objective function comprises a within-range passenger flow difference objective function, a same-line passenger flow difference objective function, other line passenger flow difference objective functions and a city interest point objective function, so that the accuracy of abnormal site identification is further improved.

The taxi passenger travel data are subjected to grid division to obtain grid data, and the grid data are subjected to cluster analysis to obtain the bus stop data to be added, so that the accuracy of the bus stop data to be added is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a bus stop optimizing method based on multi-source data according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the embodiment of the invention provides a bus stop optimizing method based on multi-source data, which comprises the following steps:

Specifically, in this embodiment, the bus track data adopts bus GPS data, the bus card swiping data adopts bus IC card data, the city interest point data includes type and position data of the city interest point, and different weights are preset for different types of the city interest point.

In the second step, the obtaining bus passenger flow data according to the bus track data and the bus card swiping data includes: preprocessing the bus track data and the bus card swiping data to obtain preprocessed track data and preprocessed card swiping data; and matching the preprocessing track data with the preprocessing card swiping data to obtain the bus passenger flow data.

Specifically, the pretreatment includes: desensitization processing, removal of latitude and longitude drift data, removal of duplicate data, removal of field missing data, and coordinate system timestamp conversion.

In the second step, the matching the preprocessed track data with the preprocessed card swiping data to obtain the bus passenger flow data includes:

Step S21, inputting the preprocessing card swiping data.

Step S22, inputting the preprocessing track data.

Specifically, in step S21, preprocessing card swiping data including a line, a vehicle number, card swiping time, a card number, and a fare is input; in step S22, preprocessing track data including a line, a vehicle number, a station name, station entering and exiting time and station longitude and latitude are input; in step S23, the preprocessed card swiping data and the preprocessed track data of the same line are matched to the same line set; in step S24, the preprocessing card swiping data and the preprocessing track data of the same vehicle number in each line set are matched to the vehicle set under the line set, and the preprocessing card swiping data and the preprocessing track data of each vehicle set with card swiping time between the in-out time and the out-of-station time are matched to the station set under the vehicle set; in step S25, the output bus passenger travel data includes a line, a card swiping time, a card number, a fare, a station name and a station longitude and latitude.

The travel data of the bus passengers in this embodiment are shown in table 1:

TABLE 1 bus passenger travel data

Where CARD NUMBER represents a CARD NUMBER, TIME PRICE represents a fare, CRATE TIME represents a CARD swiping TIME, status NAME represents a site NAME, ROUTE NAME represents a line, LONGTITUDE represents a site longitude, and LATITUDE represents a site latitude.

Illustratively, step three includes:

Specifically, in step S31, bus stop data and corresponding stop passenger flow data are input. Each bus stop data includes: station name, station longitude, station latitude, and route, each station passenger flow data comprising: station name, line and number of passenger flows. A bus stop may belong to multiple routes.

Step S32, setting the maximum population scale as N, the maximum iteration number as g _max, and initializing the iteration number g=1.

In step S33, an initial population with a population size N, i.e. a set of random bus stops, is randomly generated, and each stop is a potential solution.

Step S34, calculating an objective function value of each bus station, and dividing the individuals into different non-dominant fronts. These fronts are divided according to non-dominant relations between individuals, i.e. an individual is at least as good as another on all objective functions, and better on an objective function.

Step S35, sorting the individuals according to the non-dominant grade and the crowding degree. The higher the non-dominant ranking, the more excellent the individual. The degree of congestion is used to select individuals with low degrees of congestion if the non-dominant levels are the same.

The degree of congestion, i.e., the congestion distance, is defined as the sum of the differences between the kth-1 and the (k+1) th all objective function values. Assuming that the total number of individuals in the non-dominant layer is K, the specific steps for calculating the congestion degree of the individual K in the non-dominant layer are as follows:

In step S351, the congestion degree of each individual is initially set to 0.

In step S352, the population is non-dominated ordered for each objective function value of the individuals, so that the degree of congestion of the two individuals at the boundary is infinity, i.e., 0 _d=K_d = infinity.

Step S353, calculate the congestion degree for other individuals as follows:

wherein: k _d represents the degree of congestion of the kth individual, The j-th objective function value representing the k+1th individual,/>The j-th objective function value of the k-1 th individual is represented, m is the total number of objective functions, and m is 4 in this embodiment.

Step S36, selecting individuals according to the non-dominant grade and the crowding degree, and constructing a new population. Excellent individuals are selected while ensuring diversity of the population.

Step S37, performing cross operation on the selected individuals to generate new individuals. The interleaving helps to inherit the excellent characteristics.

Step S38, performing genetic variation operation on the current population, and introducing new variation. Variation helps to maintain diversity in the population.

Step S39, combining the initial population with the new individuals to generate a new generation population, adding one to the iteration number, returning to step S34 for circulation until the iteration number is equal to the maximum iteration number, and outputting non-dominant individuals of the final population, namely, an abnormal site result set.

Illustratively, in step S33, the randomly generated initial population employs a roulette probability selection method.

Specifically, step S33 includes: step S331, calculating fitness values for each individual in the generated population.

Step S332, converting the fitness value of each individual into a selection probability. In this embodiment, the selection probability is calculated by dividing the fitness value of an individual by the sum of all individual fitness values in the population to ensure that the sum of the selection probabilities equals 1, the selection probability E _i is given by the formula:

Where F represents each fitness value, i.e., objective function value, n represents the total number of sites, m represents the total number of fitness functions, i.e., the total number of objective functions, m is 4,j in this embodiment, which represents the j-th objective function of the current site, i represents the i-th site, k represents the k-th objective function of the i-th site, F _j represents the objective function value of the current site, and F _ik represents the function value of the k-th objective function of the i-th site.

In step S333, the selection probability is mapped to the wheel disc, so that the sector corresponding to the individual with higher fitness is larger. This can be achieved by calculating the cumulative probability. The cumulative probability C is given by:

where E _i represents the probability of selection of the ith station, i represents the ith station, and n represents the total number of stations.

In step S334, a pointer is rotated on the wheel, and the individual corresponding to the sector pointed by the pointer is selected. Since the sector corresponding to the individual with higher fitness is larger, the probability of the individual with higher fitness being selected is greater.

Step S335, repeating step S333 and step S334 until a number of individuals is selected to form a next generation population.

Illustratively, in step S34, the objective function includes: the system comprises a passenger flow difference objective function in a range, a same-line passenger flow difference objective function, other line passenger flow difference objective functions and an urban interest point objective function.

Specifically, in this embodiment, the set idea of the passenger flow difference objective function in the range is: setting an objective function F ₁ for passenger flow of other stations in a kilometer around a bus station, measuring the passenger flow difference between the current bus station and other bus stations, and if the objective function value is large, indicating that the station is abnormal, namely maximizing the passenger flow difference with the surrounding stations, wherein the formula is as follows:

Wherein P _i is the passenger flow volume of the current station, P _j is the passenger flow volume of other stations within a surrounding kilometer range, and S is the total number of stations within the surrounding kilometer range.

In this embodiment, the set idea of the same-line passenger flow difference objective function is as follows: comparing the passenger flow of the front and rear stations of the same line of the current station, setting an objective function F ₂, namely maximizing the passenger flow difference of the front and rear stations of the same line, and the formula is as follows:

Wherein P _i is the passenger flow volume of the current station, and P _z and P _k are the passenger flow volumes of the front and rear stations on the same line as the station line respectively.

In this embodiment, the set concept of the other line passenger flow difference objective function is: a station may belong to multiple lines, and an objective function F ₃ is set for the traffic on the other lines of the station, that is, to maximize the traffic difference with the other lines of the station, where the formula is as follows:

Wherein P _i is the passenger flow volume of the current station, P _m is the passenger flow volume of other lines of the station, and M is the number of other lines of the station.

In this embodiment, the set concept of the objective function of the urban interest point is: aiming at urban interest points in a preset range around a site, different weights are distributed for each type of urban interest points according to different types of the urban interest points, and compared with passenger flow, an objective function F ₄ is set, and the formula is as follows:

Wherein P _i is the passenger flow volume of the current site, P _o is the city interest points around the site, O is the number of city interest points in the preset range around the site, ω is the weight corresponding to different city interest point types, and the city interest points can be set and adjusted manually according to the actual situation.

In the fourth step, the step of meshing the taxi passenger travel data to obtain mesh data includes:

step S411, inputting the taxi passenger trip data.

Step S412, initializing a grid list.

In step S413, each taxi passenger travel data is added to the grid list.

Step S415, outputting a grid set, i.e. the grid data.

Specifically, in step S411, inputting taxi passenger travel data includes: passenger origin data and passenger destination data (i.e., passenger origin latitude and longitude and passenger destination latitude and longitude sets).

Step S412, initializing an m×n grid list GList = < < G11, G12, >, < G21, G22, >, wherein each grid gli= < p1, p2, p3, >, while initializing a grid set g= < G1, G2, >, gn, N, where each grid gi= (loni, lati, PCi), where PCi is the sum of the number of passenger start point data and passenger destination data in the grid, and loni and lati are the longitude and latitude of the grid.

In step S413, a mesh index to which each of the passenger start point data and the passenger destination data belongs is calculated, and the data is added to the mesh list.

In step S414, the longitude and latitude of each grid and the number of the passenger start point data and the passenger destination data in each grid are calculated, and when the calculation is performed, the longitude and latitude of each grid depends on the longitude and latitude average value of all the position points in the grid, and the longitude and latitude average value is added into the grid set G. The grid without data is removed.

Step S415, after all the passenger start point data and the passenger destination data are traversed, the grid set G, that is, the grid data is output.

In the fourth step, the performing cluster analysis on the grid data to obtain the bus stop data to be added includes:

Step S421, inputting the grid data and initializing a cluster center set.

Step S422, a maximum distance threshold is set.

Step S423, selecting the clustering number.

Specifically, in step S421, the grid set g= < G1, G2,., gn >, n is the number of grids, and the cluster center set c= < C1, C2, C3,.>.

In step S422, the maximum distance threshold D _max is set to 800 meters.

Step S423, selecting the clustering number K by an elbow method.

In step S424, K grids are randomly selected as an initial cluster center set, and for each passenger start point data and passenger destination data, a distance from each cluster center is calculated and allocated to the cluster center closest to the passenger start point data and passenger destination data. And selecting the grid with the largest passenger starting point data and passenger destination data as a new cluster center for the grids in each cluster, and carrying out a plurality of iterations until convergence to obtain a final cluster center set.

The calculation formula of the distance H between the two points is as follows:

where R represents the average radius of the earth, a=lat1-lat2 is the difference between the radians of two-point latitude, b=lng1-lng2 is the difference between the radians of two-point longitude, and α ₁ and α ₂ represent the longitude angles of two points, respectively.

Step S425, determining each cluster center in the final cluster center set, and if the number of the passenger start point data and the passenger destination data in all the grids in the cluster meets (in this embodiment, is greater than or equal to) the preset number threshold, outputting the data of adding the bus stop to be added to the cluster center. In this embodiment, the threshold value of the number is preset, that is, the average value of the number of passenger start point data and passenger destination data in each cluster.

Illustratively, step five comprises:

Specifically, in this embodiment, each abnormal site in the abnormal site result set is used as a circle center, and a preset distance of 800 meters is used as a radius dividing area, so as to obtain an abnormal site area result set.

According to the embodiment of the invention, by carrying out combined analysis on the bus track data, the bus card swiping data, the city interest point data and the taxi order data, abnormal stations and stations to be added are identified, and the accuracy and the data utilization rate of bus station optimization are improved.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The bus stop optimizing method based on the multi-source data is characterized by comprising the following steps of:

Step one, collecting bus track data, bus card swiping data, city interest point data and taxi order data;

step two, bus passenger flow data are obtained according to the bus track data and the bus card swiping data, and taxi passenger trip data are obtained according to the taxi order data; the bus passenger flow data comprise bus stop data and corresponding stop passenger flow data, and the taxi passenger trip data comprise passenger starting place data and corresponding passenger destination data;

Step three, constructing a bus abnormal station identification model based on an NSGA-II algorithm according to the bus passenger flow data and the city interest point data, and solving to obtain an abnormal station result set;

step four, carrying out grid division on the taxi passenger travel data to obtain grid data, and carrying out cluster analysis on the grid data to obtain bus stop data to be added;

Step five, optimizing bus stops according to the abnormal stop result set and the bus stop data to be added;

The third step comprises:

Step S31, inputting the bus stop data and the corresponding stop passenger flow data;

Step S32, initializing the total number of abnormal bus stops, setting the maximum scale and the maximum iteration number of the population, and initializing the iteration number;

step S33, randomly generating an initial population, namely a random bus stop set;

step S34, calculating an objective function value of each bus station, and dividing the individuals into different non-dominant fronts;

step S35, sorting the individuals according to the non-dominant grade and the crowding degree;

step S36, selecting individuals according to the non-dominant grade and the crowding degree, and constructing a new population;

step S37, performing cross operation on the selected individuals to generate new individuals;

step S38, performing genetic variation operation on the current population, and introducing new variation;

step S39, combining the initial population with new individuals to generate a new generation population, adding one to the iteration number, returning to step S34 for circulation until the iteration number is equal to the maximum iteration number, and outputting non-dominant individuals of the final population, namely the abnormal site result set;

in step S33, the random generation initial population adopts a roulette probability selection method;

step S33 includes: step S331, calculating fitness values for each individual in the generated population;

Step S332, converting the fitness value of each individual into a selection probability, where the selection probability is calculated by dividing the fitness value of each individual by the sum of all the fitness values of the individuals in the population, and the selection probability E _i is expressed as follows:

wherein F represents each fitness value, i.e., objective function value, n represents the total number of sites, m represents the total number of fitness functions, i.e., the total number of objective functions, j represents the j-th objective function of the current site, i represents the current site being the i-th site, k represents the k-th objective function of the i-th site, F _j represents the objective function value of the current site, and F _ik represents the function value of the k-th objective function of the i-th site;

Step S333, mapping the selection probability to the wheel disc, so that the sector corresponding to the individual with higher fitness is larger, and calculating the cumulative probability, where the formula of the cumulative probability C is as follows:

wherein E _i represents the selection probability of the ith station, i represents the ith station, and n represents the total number of stations;

step S334, rotating a pointer on the wheel disc, and selecting an individual corresponding to the sector pointed by the pointer;

step S335, repeating step S333 and step S334 until the number of individuals is selected to form a next generation population;

In step S34, the objective function includes: in-range passenger flow difference objective function, same-line passenger flow difference objective function, other line passenger flow difference objective function and city interest point objective function;

the formula of the in-range passenger flow difference objective function is as follows:

Wherein P _i is the passenger flow volume of the current station, P _j is the passenger flow volume of other stations within a surrounding kilometer range, and S is the total number of stations within the surrounding kilometer range;

The formula of the co-line passenger flow difference objective function is as follows:

Wherein P _i is the passenger flow volume of the current station, and P _z and P _k are the passenger flow volumes of the front and rear stations of the same line as the station line respectively;

the other line traffic difference objective function is formulated as follows:

Wherein P _i is the passenger flow volume of the current station, P _m is the passenger flow volume of other lines of the station, and M is the number of other lines of the station;

The formula of the urban interest point objective function is as follows:

Wherein P _i is the passenger flow volume of the current site, P _o is the urban interest points around the site, O is the number of the urban interest points in the preset range around the site, and omega is the weight corresponding to different urban interest point types;

the fifth step comprises:

Dividing an area by taking each abnormal site in the abnormal site result set as a circle center and taking a preset distance as a radius to obtain an abnormal site area result set;

2. The method for optimizing bus stops based on multi-source data according to claim 1, wherein in the second step, the obtaining bus passenger flow data according to the bus track data and the bus card swiping data comprises: preprocessing the bus track data and the bus card swiping data to obtain preprocessed track data and preprocessed card swiping data; matching the preprocessing track data with the preprocessing card swiping data to obtain the bus passenger flow data;

3. The method for optimizing a bus stop based on multi-source data according to claim 2, wherein in the step two, the matching the preprocessing track data and the preprocessing card swiping data to obtain the bus passenger flow data includes:

s21, inputting the preprocessing card swiping data;

Step S22, inputting the preprocessing track data;

step S23, performing line matching on the preprocessing track data and the preprocessing card swiping data;

step S24, the pre-processing track data and the pre-processing card swiping data after the line matching are subjected to vehicle matching and time matching, so that bus passenger travel data are obtained;

4. The method for optimizing bus stops based on multi-source data according to claim 1, wherein in the fourth step, the step of meshing the taxi passenger travel data to obtain mesh data comprises:

step S411, inputting the taxi passenger travel data;

Step S412, initializing a grid list;

step S413, adding each taxi passenger travel data to the grid list;

Step S414, calculating longitude and latitude of each grid and the number of taxi passenger travel data in each grid, and removing the grids without data;

step S415, outputting a grid set, i.e. the grid data.

5. The method for optimizing bus stops based on multi-source data according to claim 1, wherein in the fourth step, performing cluster analysis on the grid data to obtain bus stop data to be added comprises:

step S421, inputting the grid data and initializing a cluster center set;

Step S422, setting a maximum distance threshold;

step S423, selecting the clustering quantity;

step S424, randomly selecting a plurality of grids as an initial cluster center set, distributing travel data of each taxi passenger, updating the cluster center, and carrying out iteration for a plurality of times until convergence to obtain a final cluster center set;