CN117875674A - Bus scheduling method based on Q-learning - Google Patents
Bus scheduling method based on Q-learning Download PDFInfo
- Publication number
- CN117875674A CN117875674A CN202410269459.5A CN202410269459A CN117875674A CN 117875674 A CN117875674 A CN 117875674A CN 202410269459 A CN202410269459 A CN 202410269459A CN 117875674 A CN117875674 A CN 117875674A
- Authority
- CN
- China
- Prior art keywords
- data
- passenger flow
- bus
- learning
- flow data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 21
- 238000011156 evaluation Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 26
- 230000008901 benefit Effects 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 7
- 230000002787 reinforcement Effects 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 239000000446 fuel Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007306 turnover Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 206010013954 Dysphoria Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a bus dispatching method based on Q-learning, which comprises the following steps: step one, acquiring historical operation data of a public transportation system; step two, expected passenger flow data in preset time are obtained according to the historical operation data; thirdly, constructing a scheduling model according to expected passenger flow data by utilizing a Q-learning algorithm; and step four, applying the scheduling model to actual bus system operation. According to the bus dispatching method based on Q-learning, the expected passenger flow data in the preset time is obtained through prediction through the historical operation data of the bus system, and the dispatching model is built according to the expected passenger flow data by utilizing the Q-learning algorithm, so that the correlation degree of the collected data and dispatching decisions is improved, and the accuracy of bus dispatching is improved.
Description
Technical Field
The invention relates to the technical field of traffic scheduling, in particular to a bus scheduling method based on Q-learning.
Background
With the acceleration of urban progress in China, public transportation systems play an increasingly important role in solving the problem of urban traffic jam. Rapid transit (Bus Rapid Transit, BRT for short) has been widely used worldwide as a high-capacity and efficient public transportation means. How to effectively schedule the bus rapid transit so as to improve the operation efficiency and meet the requirements of passengers becomes a problem to be solved urgently.
In the prior art, a deep reinforcement learning DQN algorithm is adopted in a 'deep reinforcement learning-based electric public transportation dynamic scheduling system and method', which are disclosed in the patent publication No. CN116895144A, a deep reinforcement learning program of electric public transportation scheduling is designed, service efficiency, service reliability and operation cost are considered in the design of a cost function, and the cost function is trained, so that an intelligent agent can generate scheduling decisions according to a trained neural network; the method considers the problems of short endurance mileage and long charging time of the electric buses different from the fuel buses, and simultaneously considers the request of passengers, the electric quantity of the electric buses and the position and the state of the charging stations, so that the optimal charging planning and scheduling decision can be provided.
When bus dispatching is carried out, the influence of indirect factors such as operation income and operation cost is not ignored, the expected passenger flow can reflect the number of passengers arriving at a station in expected time, and further the dispatching decision is influenced, however, the influence on the expected passenger flow is not considered in the prior art, and the correlation of the acquired data and the dispatching decision is poor; in addition, in the prior art, the DQN algorithm has the problem that convergence cannot be guaranteed, and when reinforcement learning research of bus dispatching is performed, suboptimal strategies are easily involved, and the accuracy of bus dispatching is poor.
Disclosure of Invention
The invention provides a bus dispatching method based on Q-learning, which is used for solving the problems that the correlation degree of the bus dispatching collected data and dispatching decisions is poor and the accuracy of bus dispatching is poor in the prior art.
In one aspect, the invention provides a bus dispatching method based on Q-learning, which comprises the following steps:
step one, historical operation data of a public transportation system is obtained.
And step two, obtaining expected passenger flow data in preset time according to the historical operation data.
And thirdly, constructing a scheduling model according to the expected passenger flow data by using a Q-learning algorithm.
And step four, applying the scheduling model to actual bus system operation.
In one possible implementation manner, in step one, the historical operation data includes: departure time data, line data, arrival time data, GPS track data, and swipe card data.
In one possible implementation, the second step includes:
and obtaining historical passenger flow data according to the historical operation data.
And obtaining the expected passenger flow data according to the historical passenger flow data by adopting a space-time diagram convolution network.
In a possible implementation manner, in the third step, the step of constructing a scheduling model according to the expected passenger flow data by using a Q-learning algorithm includes:
a Q matrix is created, rows represent states, columns represent actions.
And training the Q matrix by adopting the expected passenger flow data to obtain a trained Q matrix, namely the scheduling model.
In a possible implementation manner, in step three, the training the Q matrix using the expected passenger traffic data includes:
and A, initializing the current state as an initial state.
And B1, selecting a decision action by using an epsilon-greedy strategy according to the current state and the Q matrix.
And B2, executing the decision action to obtain a new state.
And B3, observing the new state and the instant rewards.
And B4, updating the new Q value into the Q matrix.
And B5, setting the new state as the current state.
And B6, if the preset training step number is reached or the end point state is reached, entering the next step, otherwise, returning to the step B1.
And C, if the preset training times are reached, finishing training, otherwise, returning to the step A.
In one possible implementation, in step B3, the instant prize is obtained by a pre-designed prize function.
The reward function includes: an operating revenue rewards function, an operating cost rewards function, and a passenger time cost rewards function.
In one possible implementation, in step B4, a Q-learning update strategy is used to perform the Q-value update.
In a possible implementation manner, in the third step, after the scheduling model is obtained, the expected passenger flow data is further used to test the scheduling model.
In one possible implementation, the fourth step includes:
and acquiring real-time operation data of the public transport system.
And obtaining expected passenger flow data corresponding to the real-time operation data, inputting the expected passenger flow data into the scheduling model, and outputting a scheduling decision.
And carrying out actual scheduling according to the scheduling decision.
In one possible implementation manner, the step four further includes: and performing performance evaluation on the scheduling model.
And when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation.
The bus scheduling method based on Q-learning has the following advantages:
the method comprises the steps of predicting expected passenger flow data in preset time through historical operation data of a bus system, constructing a scheduling model according to the expected passenger flow data by utilizing a Q-learning algorithm, improving the correlation degree of collected data and scheduling decisions, and improving the accuracy of bus scheduling; the expected passenger flow data is obtained according to the historical passenger flow data by adopting the space-time diagram convolution network, so that the accuracy of the expected passenger flow data is improved; the proposed reward function includes: the operation income rewarding function, the operation cost rewarding function and the passenger time cost rewarding function improve the accuracy of bus dispatching by considering the influence of the passenger time cost; and when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation, thereby improving the optimizability of the scheduling model.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a bus dispatching method based on Q-learning according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another bus scheduling method based on Q-learning according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the embodiment of the invention provides a bus dispatching method based on Q-learning, which comprises the following steps:
step one, historical operation data of a public transportation system is obtained.
And step two, obtaining expected passenger flow data in preset time according to the historical operation data.
And thirdly, constructing a scheduling model according to the expected passenger flow data by using a Q-learning algorithm.
And step four, applying the scheduling model to actual bus system operation.
Illustratively, in step one, the historical operating data includes: departure time data, line data, arrival time data, GPS track data, and swipe card data.
Illustratively, step two comprises:
and obtaining historical passenger flow data according to the historical operation data.
And obtaining the expected passenger flow data according to the historical passenger flow data by adopting a space-time diagram convolution network.
Specifically, in this embodiment, historical operation data is counted with 5 minutes as a time interval, and historical passenger flow data of every 5 minutes is obtained by comprehensively analyzing departure time data, line data, arrival time data, GPS track data and card swiping data, including passenger flow data of each station at a certain historical moment.
And analyzing and predicting the historical passenger flow data by adopting a trained space-time diagram convolution network to obtain the expected passenger flow data. The training process of the space-time diagram convolution network comprises the following steps: taking data in a historical passenger flow training set (namely a training data set established according to historical passenger flow data) as input of a space-time diagram convolution network, and outputting expected passenger flow training data; and taking the actual passenger flow data after the input data corresponds to the expected time as a target value, and after a plurality of training iterations, minimizing a loss function to obtain a trained space-time diagram convolution network.
Illustratively, in the third step, the step of constructing a scheduling model according to the expected passenger flow data by using a Q-learning algorithm includes:
a Q matrix is created, rows represent states, columns represent actions.
And training the Q matrix by adopting the expected passenger flow data to obtain a trained Q matrix, namely the scheduling model.
Specifically, in this embodiment, the agent of the Q matrix is denoted as a bus, the state is denoted as the passenger flow volume of each bus line, and the action is denoted as selecting a certain time and a certain bus line to take a departure. The intelligent agent traverses all the lines, acquires the maximum Q value of the action combination in the current state, acquires the action corresponding to the maximum Q value, executes the action, and then shifts to the next state.
In this embodiment, all initial Q values are set to 0. Setting the learning rate (alpha) to 0.5 and the discount factor (gamma) to 0.9; the training times were set to 100, the maximum number of training steps to 20, and the epsilon value in the epsilon-greedy strategy to 0.2. In other possible embodiments, the above parameters may also be adjusted according to the actual situation.
Illustratively, in the third step, the training the Q matrix using the expected passenger flow data includes:
and A, initializing the current state as an initial state.
And B1, selecting a decision action by using an epsilon-greedy strategy according to the current state and the Q matrix.
And B2, executing the decision action to obtain a new state.
And B3, observing the new state and the instant rewards.
And B4, updating the new Q value into the Q matrix.
And B5, setting the new state as the current state.
And B6, if the preset training step number is reached or the end point state is reached, entering the next step, otherwise, returning to the step B1.
And C, if the preset training times are reached, finishing training, otherwise, returning to the step A.
Illustratively, in step B3, the instant prize is obtained by a pre-designed prize function.
The reward function includes: an operating revenue rewards function, an operating cost rewards function, and a passenger time cost rewards function.
Specifically, considering the operation income, operation cost and time cost of passengers, the reward function R can be expressed asWherein R is I Indicating the operation income, R O Represents the operation cost, R P Indicating the passenger time cost.
The operating revenue reward function is as follows:
where k represents the number of passengers at station j and s represents the fare.
The operation cost of the public transportation enterprise comprises fixed cost and vehicle operation cost, the vehicle operation cost and the operation mileage have positive correlation, the operation cost is directly expressed by using the vehicle operation cost, and the operation cost rewarding function has the following formula:
wherein the method comprises the steps ofRepresenting the running cost between the current site i and site j,/->The distance between the stations i and j is represented, p represents the unit fuel consumption cost, n represents the number of stations, the departure point is marked as the 0 th station, and the parking place is marked as the n+1th station.
Let the passengers arrive at the stop in time in the bus arrival time, the time cost of the passengers is the waiting time cost of the passengers brought by the bus arrival at the stop later, and the rewarding function of the time cost of the passengers is as follows:
wherein the method comprises the steps ofTime cost representing passengers at station j; k is the number of passengers at station j; />Representing the actual time of arrival of the bus at station j; />The latest time of the time window of the station j; />The time value of the passenger, namely the value corresponding to the time saved by taking the passenger into the bus, is preset, and in the embodiment, the time value of the passenger is set to be 50 yuan per hour; />In this embodiment, the positive number is 0.0001, and the denominator is 0 is avoided.
In summary, the reward function is as follows:
illustratively, in step B4, the Q value is updated using a Q-learning update strategy.
Specifically, a memory matrix is first definedSequentially recording all states st and corresponding actions at experienced by the intelligent agent; setting a memory matrix as a matrix of h rows and 2 columns, wherein h represents the number of states experienced from the initial moment to the current moment; by +.>Finding the Q value corresponding to the previous state-action for the index and updating; then let t decrease by 1 and determine if t-1 is 0If 0, the Q values of all the 'state-actions' experienced by the state st in the previous process are updated; if not, searching the Q value of the next state-action to update until all the Q values are updated. The Q-learning update strategy is as follows:
wherein the method comprises the steps ofRepresenting status->Take action->Q value updated later, +.>Indicates the state at time g->Representing status->Action taken,/->Representing status->Take action->Instant rewards, funeral>For discount coefficient, ++>Representing status->Taking action a, g=t-1, t-2, …,2,1, the maximum Q value available.
In the third step, the scheduling model is further tested by using the expected passenger flow data after the scheduling model is obtained.
Specifically, the testing step includes:
a', initializing the current state as the initial state.
B1', selecting the action with the maximum Q value as the decision action according to the current state and the trained Q matrix.
And B2', executing the decision action to obtain a new state.
And B3', observing the new state and the instant rewards.
And B4', setting the new state as the current state.
And B5 ', ending the test if the end point state is reached, otherwise returning to the step B1'.
Illustratively, step four comprises:
and acquiring real-time operation data of the public transport system.
And obtaining expected passenger flow data corresponding to the real-time operation data, inputting the expected passenger flow data into the scheduling model, and outputting a scheduling decision.
And carrying out actual scheduling according to the scheduling decision.
Specifically, taking expected passenger flow data corresponding to the real-time operation data as the current state of a scheduling model, and selecting an action with the maximum Q value as a decision action, namely scheduling decision; and generating a bus departure time table or a route plan according to the scheduling decision.
As shown in fig. 2, step four further includes, illustratively: and performing performance evaluation on the scheduling model.
And when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation.
Specifically, the generated bus departure schedule or route planning is used for simulation operation. And according to the result of the simulation operation, evaluating performance indexes of the bus system, including passenger waiting time and vehicle utilization rate. And when the performance evaluation result is lower than the preset expectation, returning to the first step for circulation, adjusting the learning rate, discount factors and other super parameters, and carrying out training and testing again until the performance evaluation result is higher than or equal to the preset expectation.
According to the embodiment of the invention, the expected passenger flow data in the preset time is obtained through the prediction of the historical operation data of the bus system, and the Q-learning algorithm is utilized to construct and obtain the scheduling model according to the expected passenger flow data, so that the correlation degree between the acquired data and the scheduling decision is improved, and the accuracy of bus scheduling is improved; the expected passenger flow data is obtained according to the historical passenger flow data by adopting the space-time diagram convolution network, so that the accuracy of the expected passenger flow data is improved; the proposed reward function includes: the operation income rewarding function, the operation cost rewarding function and the passenger time cost rewarding function improve the accuracy of bus dispatching by considering the influence of the passenger time cost; and when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation, thereby improving the optimizability of the scheduling model.
In one possible embodiment, the research principle of the present application is as follows:
according to whether a bus company adjusts the line capacity configuration, the dispatching operation modes mainly comprise two modes: capacity is unchanged and capacity is increased.
The transport capacity is not modified. In the operation mode with unchanged transport capacity, the number of buses allocated to the bus route is not adjusted, and part of transport capacity is reserved for rapid bus dispatching by adjusting the departure interval of the whole bus. The bus rapid transit and the whole-course bus are consistent in terms of operation lines, vehicle configuration and road right use, and only differ in stop and departure frequencies. The mode is suitable for the current line with higher transport capacity level, and the vehicle acquisition cost of a public transport company is not additionally increased in the scheduling mode.
Increased capacity. In the operation mode of increasing the transport capacity, the existing whole-course bus departure plan is not adjusted, and bus companies additionally increase the number of buses for rapid bus dispatching. The express buses and the whole-course buses are consistent in terms of operation lines and road right use, and are different in terms of stop stops, departure frequencies and vehicle configuration. The mode is suitable for the current line with lower transport capacity level, and the vehicle acquisition cost of a public transport company can be increased to a certain extent in the scheduling mode.
When the bus line passenger flow has the requirement concentration, the distribution form is multimodal, and the proportion of the passengers traveling in the middle and long distance is large, a rapid bus dispatching mode is needed. Parameters such as a direction imbalance coefficient, a station imbalance coefficient and the like are generally adopted to determine the opening direction of the bus rapid transit and the service station.
The opening direction is determined. The rapid transit opening direction is mainly determined according to a non-uniform direction coefficient, and the rapid transit opening direction is the ratio of a larger value in the upstream and downstream passenger flow volume of the line to the upstream and downstream average passenger flow volume. By comparing the non-uniform coefficients of the upward and downward directions of the lines, when the passenger flow in one direction is obviously larger than the passenger flow in the other direction, rapid transit is opened in the direction with less passenger flow so as to increase the turnover of the vehicle.
And determining a stop station. The number of bus rapid transit service stations has direct influence on passengers and buses, and if the number of stations is too small, the problems of insufficient passenger flow, resource waste, too little income and the like can be caused; if the number of stations is too large, the running time of the bus rapid transit is increased, and the advantage of the bus rapid transit relative to the whole bus is not obvious. The stations are selected mainly from the boarding and disembarking stations and the junction stations of the large passenger flow. In the actual selection process, a part of intermediate stations which are relatively close to the starting station or the end station are smaller in upper passenger quantity but smaller in lower passenger quantity, or the station imbalance coefficient is smaller because of the larger upper passenger quantity but smaller upper passenger quantity, so that the rapid transit station set cannot be accessed, and part of passenger flow is lost to a certain extent. Therefore, when the rapid transit stops are selected, the stops with larger passenger flow are required to be brought into the rapid transit stop set.
The problem that passenger flows along buses are unevenly distributed in time, stations and directions cannot be solved by single whole-course bus dispatching, station passenger flow distribution rules of the unidirectional lines in the rush hour are researched, a rush hour rapid bus dispatching strategy of the unidirectional bus lines in the rush hour is provided, and basis can be provided for the configuration of the traffic capacity in the rush hour and the bus lines with strong tidal property.
The bus rapid transit scheduling policy may also have a negative impact on line operations, such as: the scheduling of buses slows down the turnover of buses, relatively increases in passenger waiting time at non-bus rapid transit service stops, and the like. Therefore, the rapid transit scheduling problem is mainly to study the optimal service frequency and the optimal service station so as to consider the benefits of passengers and the operation benefits of public transit companies.
In one possible embodiment, the constraints of the scheduling policy mainly include: full load rate, station passenger waiting time, and departure frequency.
Full load rate: the full load ratio is the ratio of the actual passenger capacity in the vehicle to the rated passenger capacity of the vehicle, and is an index for measuring the utilization rate of the vehicle. The full load rate is too high, so that the riding comfort of passengers is greatly reduced; the full load rate is too low, and the economic benefit of the public transport company cannot be guaranteed. Therefore, on the premise of ensuring the social benefit, the economic benefit of the public transport company is ensured by the full load rate constraint, and the optimal benefit of passengers and the public transport company is realized. According to urban public transportation management standards, the full load rate is not more than 120% in terms of comfort of passengers, and is not less than 50% in terms of economic benefit of public transportation companies.
Station passenger waiting time: according to the psychological state change of passengers in the waiting process, the passengers feel very anxiety and dysphoria after waiting for 15 minutes, want to change other traffic modes, and are hopeless after 50 minutes, and decide not to sit on the bus any more. Therefore, the upper limit and the lower limit of the waiting time are 25min and 2min respectively.
Departure frequency: the departure frequency is closely linked to the interests of passengers and buses. The departure frequency is too low, the waiting time of passengers is increased, and a part of passenger flow may be lost; the departure frequency is too high, and the economic benefit of the public transport company cannot be ensured. According to the related study, the bus departure interval is 3min at the peak time when the passenger satisfaction reaches 80%, and therefore, the upper limit of the departure frequency is 20 vehicles/hour.
In one possible embodiment, python is adopted as a simulation set-up language, and bus gps track data, bus arrival tables, bus card swiping data and bus network information multisource data fusion analysis are utilized to obtain real-time passenger flow information of each bus line, and when excessive passenger flow of the lines or road section congestion occurs, the number of vehicles is dynamically changed to meet passenger riding demands and reduce road congestion. According to the method, the capacity-increasing type bus dispatching mode is used, the reinforcement learning is achieved by observing the change of each state in the environment, the action feedback is obtained from the environment, and the optimal control strategy is learned to obtain the maximum overall rewards.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
1. A bus dispatching method based on Q-learning is characterized by comprising the following steps:
step one, acquiring historical operation data of a public transportation system;
step two, expected passenger flow data in preset time are obtained according to the historical operation data;
thirdly, constructing a scheduling model according to the expected passenger flow data by utilizing a Q-learning algorithm;
and step four, applying the scheduling model to actual bus system operation.
2. The bus dispatching method based on Q-learning according to claim 1, wherein in the first step, the historical operation data includes: departure time data, line data, arrival time data, GPS track data, and swipe card data.
3. The bus dispatching method based on Q-learning according to claim 1, wherein the second step comprises:
obtaining historical passenger flow data according to the historical operation data;
and obtaining the expected passenger flow data according to the historical passenger flow data by adopting a space-time diagram convolution network.
4. The bus dispatching method based on Q-learning according to claim 1, wherein in the third step, the step of constructing a dispatching model according to the expected passenger flow data by using Q-learning algorithm includes:
creating a Q matrix, wherein rows represent states and columns represent actions;
and training the Q matrix by adopting the expected passenger flow data to obtain a trained Q matrix, namely the scheduling model.
5. The bus dispatching method based on Q-learning according to claim 4, wherein in step three, the training the Q matrix using the expected passenger flow data comprises:
a, initializing a current state as an initial state;
b1, selecting a decision action by using an epsilon-greedy strategy according to the current state and the Q matrix;
b2, executing the decision action to obtain a new state;
b3, observing a new state and instant rewards;
b4, updating the new Q value into the Q matrix;
b5, setting the new state as the current state;
if the preset training step number is reached or the end point state is reached, entering the next step, otherwise returning to the step B1;
and C, if the preset training times are reached, finishing training, otherwise, returning to the step A.
6. The bus dispatching method based on Q-learning according to claim 5, wherein in step B3, the instant rewards are obtained by a pre-designed rewarding function;
the reward function includes: an operating revenue rewards function, an operating cost rewards function, and a passenger time cost rewards function.
7. The bus dispatching method based on Q-learning according to claim 5, wherein in step B4, Q value update is performed by using Q-learning update strategy.
8. The bus dispatching method based on Q-learning according to claim 4, wherein in step three, after the dispatching model is obtained, the expected passenger flow data is further adopted to test the dispatching model.
9. The bus dispatching method based on Q-learning as set forth in claim 1, wherein the fourth step comprises:
acquiring real-time operation data of a public transport system;
obtaining expected passenger flow data corresponding to the real-time operation data, inputting the expected passenger flow data into the scheduling model, and outputting a scheduling decision;
and carrying out actual scheduling according to the scheduling decision.
10. The bus dispatching method based on Q-learning according to claim 1, further comprising, after the fourth step: performing performance evaluation on the scheduling model;
and when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410269459.5A CN117875674B (en) | 2024-03-11 | 2024-03-11 | Bus scheduling method based on Q-learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410269459.5A CN117875674B (en) | 2024-03-11 | 2024-03-11 | Bus scheduling method based on Q-learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117875674A true CN117875674A (en) | 2024-04-12 |
CN117875674B CN117875674B (en) | 2024-06-21 |
Family
ID=90595083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410269459.5A Active CN117875674B (en) | 2024-03-11 | 2024-03-11 | Bus scheduling method based on Q-learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117875674B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118378760A (en) * | 2024-06-24 | 2024-07-23 | 西北大学 | Bus network optimization method and device based on Q-learning algorithm |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2636537A1 (en) * | 2008-06-30 | 2009-12-30 | Autonomous Solutions, Inc. | Vehicle dispatching method and system |
CN111415048A (en) * | 2020-04-10 | 2020-07-14 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN112085249A (en) * | 2020-07-27 | 2020-12-15 | 北京工业大学 | Customized bus route planning method based on reinforcement learning |
CN113415322A (en) * | 2021-08-03 | 2021-09-21 | 东北大学 | High-speed train operation adjusting method and system based on Q learning |
CN113536692A (en) * | 2021-08-03 | 2021-10-22 | 东北大学 | Intelligent dispatching method and system for high-speed rail train in uncertain environment |
CN113673836A (en) * | 2021-07-29 | 2021-11-19 | 清华大学深圳国际研究生院 | Shared bus line-pasting scheduling method based on reinforcement learning |
CN114004452A (en) * | 2021-09-28 | 2022-02-01 | 通号城市轨道交通技术有限公司 | Urban rail scheduling method and device, electronic equipment and storage medium |
CN114117883A (en) * | 2021-09-15 | 2022-03-01 | 兰州理工大学 | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning |
CN114841415A (en) * | 2022-04-12 | 2022-08-02 | 西南交通大学 | Urban rail transit passenger flow prediction and multistage transportation organization method during large-scale activities |
CN115880936A (en) * | 2022-11-25 | 2023-03-31 | 东风悦享科技有限公司 | Simulation system and method applied to intelligent dispatching of abnormal passenger flow of bus |
CN116050581A (en) * | 2022-12-14 | 2023-05-02 | 成都秦川物联网科技股份有限公司 | Smart city subway driving scheduling optimization method and Internet of things system |
CN117172461A (en) * | 2023-08-25 | 2023-12-05 | 河南科技大学 | Automatic driving bus dispatching system and bus dispatching method based on passenger flow prediction |
-
2024
- 2024-03-11 CN CN202410269459.5A patent/CN117875674B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2636537A1 (en) * | 2008-06-30 | 2009-12-30 | Autonomous Solutions, Inc. | Vehicle dispatching method and system |
CN111415048A (en) * | 2020-04-10 | 2020-07-14 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN112085249A (en) * | 2020-07-27 | 2020-12-15 | 北京工业大学 | Customized bus route planning method based on reinforcement learning |
CN113673836A (en) * | 2021-07-29 | 2021-11-19 | 清华大学深圳国际研究生院 | Shared bus line-pasting scheduling method based on reinforcement learning |
CN113415322A (en) * | 2021-08-03 | 2021-09-21 | 东北大学 | High-speed train operation adjusting method and system based on Q learning |
CN113536692A (en) * | 2021-08-03 | 2021-10-22 | 东北大学 | Intelligent dispatching method and system for high-speed rail train in uncertain environment |
CN114117883A (en) * | 2021-09-15 | 2022-03-01 | 兰州理工大学 | Self-adaptive rail transit scheduling method, system and terminal based on reinforcement learning |
CN114004452A (en) * | 2021-09-28 | 2022-02-01 | 通号城市轨道交通技术有限公司 | Urban rail scheduling method and device, electronic equipment and storage medium |
CN114841415A (en) * | 2022-04-12 | 2022-08-02 | 西南交通大学 | Urban rail transit passenger flow prediction and multistage transportation organization method during large-scale activities |
CN115880936A (en) * | 2022-11-25 | 2023-03-31 | 东风悦享科技有限公司 | Simulation system and method applied to intelligent dispatching of abnormal passenger flow of bus |
CN116050581A (en) * | 2022-12-14 | 2023-05-02 | 成都秦川物联网科技股份有限公司 | Smart city subway driving scheduling optimization method and Internet of things system |
CN117172461A (en) * | 2023-08-25 | 2023-12-05 | 河南科技大学 | Automatic driving bus dispatching system and bus dispatching method based on passenger flow prediction |
Non-Patent Citations (4)
Title |
---|
彭理群 等;: "基于Q-learning的定制公交跨区域路径规划研究", 交通运输系统工程与信息, no. 01, 15 February 2020 (2020-02-15) * |
王国磊 等;: "基于模糊聚类的Q-学习在动态调度中的应用", 计算机集成制造系统, no. 04, 15 April 2009 (2009-04-15) * |
王鹏勇 等;: "基于深度强化学习的机场出租车司机决策方法", 计算机与现代化, no. 08, 15 August 2020 (2020-08-15) * |
陆百川 等;: "城市外围非高峰时段公交柔性调度系统研究", 重庆交通大学学报(自然科学版), no. 02, 15 February 2020 (2020-02-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118378760A (en) * | 2024-06-24 | 2024-07-23 | 西北大学 | Bus network optimization method and device based on Q-learning algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN117875674B (en) | 2024-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | A deep reinforcement learning approach to ride-sharing vehicle dispatching in autonomous mobility-on-demand systems | |
CN117875674B (en) | Bus scheduling method based on Q-learning | |
CN110458456B (en) | Demand response type public transportation system scheduling method and system based on artificial intelligence | |
CN110428096A (en) | The more cross-channel Transportation Organization optimization methods of urban track traffic based on ticket information | |
Venturini et al. | Linking narratives and energy system modelling in transport scenarios: A participatory perspective from Denmark | |
CN110598971B (en) | Responsive bus service planning method based on ant colony algorithm | |
Wu et al. | Joint optimization of timetabling, vehicle scheduling, and ride-matching in a flexible multi-type shuttle bus system | |
Wang et al. | Joint optimization of running route and scheduling for the mixed demand responsive feeder transit with time-dependent travel times | |
CN112700034B (en) | Method, device and equipment for selecting intermodal transport path and readable storage medium | |
CN110363329A (en) | One kind being based on the matched net of the bilateral satisfaction of supply and demand about vehicle worksheet processing method | |
Wu et al. | Predicting peak load of bus routes with supply optimization and scaled Shepard interpolation: A newsvendor model | |
CN115170006B (en) | Dispatching method, device, equipment and storage medium | |
CN109344991A (en) | A kind of public bus network highest section passenger flow forecasting | |
CN110570656A (en) | Method and device for customizing public transport line | |
WO2024174566A1 (en) | Multi-vehicle-type timetable design method and system for intelligent bus system | |
Kamel et al. | A modelling platform for optimizing time-dependent transit fares in large-scale multimodal networks | |
Cui et al. | Dynamic pricing for fast charging stations with deep reinforcement learning | |
Lee et al. | Optimal Automated Demand Responsive Feeder Transit Operation and Its Impact | |
Lu et al. | Demand-responsive transport for students in rural areas: A case study in vulkaneifel, germany | |
Sadrani et al. | Designing limited-stop bus services for minimizing operator and user costs under crowding conditions | |
Kim et al. | Integrated design framework for on-demand transit system based on spatiotemporal mobility patterns | |
CN116822729A (en) | Urban rail fast and slow vehicle scheduling strategy optimization method and system based on passenger flow demand | |
CN112632374B (en) | Resident trip mode selection analysis method considering customized buses | |
Brands | Multi-objective optimisation of multimodal passenger transportation networks | |
CN114925911A (en) | Self-adaptive dynamic scheduling method and system based on accurate passenger flow prediction of unmanned bus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |