CN112348258A - Shared bicycle predictive scheduling method based on deep Q network - Google Patents
Shared bicycle predictive scheduling method based on deep Q network Download PDFInfo
- Publication number
- CN112348258A CN112348258A CN202011240256.1A CN202011240256A CN112348258A CN 112348258 A CN112348258 A CN 112348258A CN 202011240256 A CN202011240256 A CN 202011240256A CN 112348258 A CN112348258 A CN 112348258A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- model
- time
- network model
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Administration (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Traffic Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a shared bicycle prediction scheduling method based on a deep Q network, which comprises the following steps: 1. designing a simulation environment for simulating actual dispatching of the shared bicycle; 2. acquiring user information and constructing a user behavior data matrix; 3. training a prediction network model consisting of a linear regression model and an SVM model; 4. training a prediction scheduling model based on a deep Q network by combining a prediction network model; and 5, carrying out real-time scheduling by using the trained model. The method can predict the vehicle demand of each vehicle area in the future time period in advance by combining the linear regression model, the SVM model and the deep reinforcement learning method of the deep Q network under the condition of lacking enough training data, so that the shared single vehicle in each vehicle area can be rapidly and reasonably scheduled in advance.
Description
Technical Field
The invention belongs to the field of shared bicycle scheduling, and particularly relates to a shared bicycle prediction scheduling method based on a deep Q network.
Background
With the progress of society, sharing economy is more and more common around our body. The problem that people go out for the last kilometer is solved to a great extent due to the appearance of the shared bicycle, but the problem of unreasonable placement of the shared bicycle is troubling managers at all levels. The main problem lies in that the quantity of the bicycles in each area is not matched with the requirement of the bicycle, so that a large number of idle bicycles are accumulated in some areas, and the bicycles are not available in some areas. Therefore, how to reasonably allocate the shared resources in each area, so as to avoid the waste of resources, is a difficult problem which always troubles the companies providing the shared services.
The conventional shared bicycle scheduling is mostly based on the idea of average distribution of each area, and the requirement difference of the bicycles in different types of areas and different time periods is rarely considered, so that the situation that the shared bicycles in partial areas are stacked and the bicycles in some areas are not available is caused. The scheduling algorithm is generally applied to various fields in life, such as an elevator scheduling algorithm, a time slice-based round robin scheduling algorithm and the like, and with the vigorous development of big data and artificial intelligence technology, the traditional single scheduling algorithm cannot meet the current requirements.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a shared bicycle forecasting and scheduling method based on a deep Q network, so that the vehicle demand of each vehicle area in the future time period can be forecasted in advance, the scheduling of the shared bicycle is more reasonable and efficient, and the sharing advantage is fully exerted.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a shared bicycle prediction scheduling method based on a deep Q network, which is characterized by comprising the following steps of:
step 1: establishing a vehicle area model and an unmanned dispatching transport vehicle operation environment model;
step 2: collecting days of all users in the vehicle area kUsual behavior data and the number h of single vehicles in the vehicle area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and a one-hot encoded form vector converted from weather information of the jth record generated by the user i,time information representing the j-th record generated by user i,the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N;
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model and using the historical vehicle-using time t of the vehicle-using region kkAnd the corresponding vehicle number is used as an input variable, the hyper-parameters of the linear regression model are optimized until the linear regression model converges, and the vehicle number n 'of the prediction model for predicting the future time period of the vehicle using area k is obtained'k;
Step 3.2: constructing an SVM model, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of the user i, if the classification result is 1, the user i needs to use the vehicle,otherwise, the user i does not need to use the vehicle, and the predicted vehicle utilization amount n' of the future time period of the vehicle utilization area k is calculated according to the classification resultk;
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' of the future time periodkObtaining the prediction result of the prediction network model by weighted calculation, namely the predicted vehicle consumption n of the future time period of the vehicle area kk;
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηt,κt},ηtIndicating the direction information of the vehicle at time t, ktShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρt,ιt,μt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatPosition information, mu, representing a scheduled vehicle at time ttPosition information indicating each user at time t;
step 6: the reward function R is set using equation (1):
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents non-parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: an input layer, one comprising m1A hidden layer of layers, an FC layer and an output layer, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0;
Step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and initializing network parameters in the prediction target network model into the prediction target network model in a Gaussian initialization mode
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the prediction evaluation network model by using the formula (7):
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) An estimate, Q(s), representing the true cumulative return of the predicted target network model at time tt,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value;
step 12: and utilizing the optimal prediction target network model to realize real-time scheduling of the number of the single vehicles in each vehicle area.
Compared with the prior art, the invention has the beneficial effects that:
1. the shared bicycle predictive scheduling method based on the deep Q network overcomes the problem of hysteresis of the traditional scheduling algorithm in the process of processing the problems by a mode of combining prediction and scheduling, thereby greatly improving the utilization rate of the shared bicycle;
2. the prediction network model can sense the vehicle using requirements of the user in advance by combining the advantages of the linear regression model and the SVM model, so that the vehicle can be scheduled in place before the user actually uses the vehicle, and the waiting time of the user is reduced;
3. the reinforcement learning method combined with the prediction model can optimize the hyper-parameters by using the learning mode of experience playback under the condition of insufficient training data, thereby greatly reducing the training cost, improving the efficiency of the model, greatly improving the scheduling efficiency and scheduling timeliness of the shared bicycle, and reducing the scheduling cost of the shared bicycle.
Drawings
FIG. 1 is a diagram of a vehicle region model and an operating environment model of an unmanned dispatching transportation vehicle according to the present invention;
FIG. 2 is a graph of the average reward variation of the optimized optimal predicted objective network model according to the present invention;
fig. 3 is a flowchart of a method for predicting and scheduling a shared single vehicle based on a deep Q network according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 3, a shared bicycle prediction scheduling method based on a deep Q network predicts the vehicle demand of each vehicle area in a future time period in advance by combining a linear regression model, an SVM model and a deep reinforcement learning method of the deep Q network under the condition of lack of sufficient training data, and specifically includes the following steps:
step 1: configuring a simulation environment by using a Tkint tool in a Python GUI library, and establishing a vehicle area model and an unmanned dispatching transport vehicle running environment model, wherein the vehicle area model consists of the following parts: the urban environment is simulated by adopting 5 multiplied by 5 grids, A-F respectively represent six different types of vehicle areas of schools, parks, stadiums, pedestrian streets, office buildings and subway stations, parking areas and parking upper limits of different parking areas are defined in each vehicle area, the parking areas are represented by grey, and numbers are used for distinguishing, so that the vehicle demand difference of different types of areas is simulated, the total number of the vehicles is assumed to be 100, 20, 10, 20 and 20 vehicles are respectively distributed to the six areas during initialization, and the maximum vehicle capacity of the areas is 30, 20, 15, 25, 30 and 50; the unmanned dispatching transport vehicle operation environment model comprises: the position of the unmanned dispatching transport vehicle is represented by a black short solid line, the actions of the unmanned dispatching transport vehicle comprise ascending, descending, left-going, right-going, single vehicle putting down and single vehicle taking away, 10 single vehicles are distributed to the unmanned dispatching transport vehicle during initialization, the dotted line simulates an urban road, a blank area represents that the unmanned dispatching transport vehicle is forbidden to pass through other areas, the black solid lines on the periphery represent a vehicle boundary, and the dispatching vehicle cannot cross the boundary, as shown in figure 1;
step 2: collecting daily behavior data of all users in the vehicle using area k and the number h of single vehicles in the vehicle using area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and a one-hot encoded form vector converted from weather information of the jth record generated by the user i,time information representing the j-th record generated by user i,the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N;
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model f (t) ═ at2+ bt + c, a, b, c represent three hyper-parameters to be adjusted during training, and the historical vehicle using time t of the vehicle using region kkAnd the corresponding number f (t) of carsk) Optimizing hyper-parameters of the linear regression model for input variables until the linear regression model converges, thereby obtaining a forecasting model for forecasting the vehicle using number n 'of the future time period of the vehicle using area k'k;
Step 3.2: construction of the SVM model δi=sign(ω*dij+d*),ω*、d*Representing the hyper-parameters needing to be adjusted, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of a user i, if the classification result is 1, the user i needs to use the vehicle, otherwise, the user i does not need to use the vehicle, and calculating the predicted vehicle using amount n' of the vehicle using region k in the future time period according to the classification resultk;
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' for the future time periodkThrough nk=0.4n′k+0.6n″kWeighted male ofCalculating to obtain the prediction result of the prediction network model, namely the predicted vehicle consumption n of the future time period of the vehicle-using region kk;
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηt,κt},ηtThe direction information of the vehicle at time t includes upward driving, downward driving, leftward driving, rightward driving, and κtShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρt,ιt,μt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatThe position information of the dispatching transport vehicle at the time t comprises coordinates corresponding to a horizontal axis and a vertical axis, mutThe position information of each user at the moment t is shown, namely the current area of each user and whether the vehicle is used or not;
step 6: and (2) setting a reward function R by using the formula (1), wherein when the unmanned dispatching transport vehicle runs in the network, the running environment gives corresponding rewards according to the reward function in combination with the action taken by the unmanned dispatching transport vehicle and the state change of the environment:
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents no parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, a maximum iteration number as A, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: a 3-layer input layer comprising 13 neurons, a layer comprising m1M of neuron2A hidden layer of layers, an FC layer and an output layer containing 7 neurons, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0During training, firstly converting time sequence data corresponding to the environment state information into a tensor, then inputting the obtained tensor into a model for training, and outputting the action information of the unmanned dispatching transport vehicle;
step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and adopting a Gaussian initialization mode to predict a target networkInitialization of network parameters in a network model
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the evaluation network model by using the formula (7):
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) Estimate representing the true cumulative return of the predicted target network model at time t,Q(st,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true or not, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value, an average reward change curve of the obtained optimal prediction target network model after 100 times of iterative tests is shown in figure 2, a horizontal axis represents iteration times, a vertical axis represents accumulated rewards obtained corresponding to a training period, and finally the average reward is stabilized at about-100;
step 12: and (4) realizing real-time scheduling of the number of the single vehicles in each vehicle area by utilizing the optimal prediction target network model.
Claims (1)
1. A shared bicycle predictive scheduling method based on a deep Q network is characterized by comprising the following steps:
step 1: establishing a vehicle area model and an unmanned dispatching transport vehicle operation environment model;
step 2: collecting daily behavior data of all users in the vehicle using area k and the number h of single vehicles in the vehicle using area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and a one-hot encoded form vector converted from weather information of the jth record generated by the user i,to representThe time information of the j-th record generated by the user i,the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N;
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model and using the historical vehicle-using time t of the vehicle-using region kkAnd the corresponding vehicle number is used as an input variable, the hyper-parameters of the linear regression model are optimized until the linear regression model converges, and the vehicle number n 'of the prediction model for predicting the future time period of the vehicle using area k is obtained'k;
Step 3.2: constructing an SVM model, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of a user i, if the classification result is 1, the user i needs to use the vehicle, otherwise, the user i does not need to use the vehicle, and calculating the predicted vehicle using amount n' of a vehicle using region k in the future time period according to the classification resultk;
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' of the future time periodkObtaining the prediction result of the prediction network model by weighted calculation, namely the predicted vehicle consumption n of the future time period of the vehicle area kk;
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηt,κt},ηtIndicating the direction information of the vehicle at time t, ktShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρt,ιt,μt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatPosition information, mu, representing a scheduled vehicle at time ttPosition information indicating each user at time t;
step 6: the reward function R is set using equation (1):
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents non-parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: an input layer, one comprising m1A hidden layer of layers, an FC layer and an output layer, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0;
Step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and initializing network parameters in the prediction target network model into the prediction target network model in a Gaussian initialization mode
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the prediction evaluation network model by using the formula (7):
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) An estimate, Q(s), representing the true cumulative return of the predicted target network model at time tt,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value;
step 12: and utilizing the optimal prediction target network model to realize real-time scheduling of the number of the single vehicles in each vehicle area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011240256.1A CN112348258B (en) | 2020-11-09 | 2020-11-09 | Shared bicycle predictive scheduling method based on deep Q network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011240256.1A CN112348258B (en) | 2020-11-09 | 2020-11-09 | Shared bicycle predictive scheduling method based on deep Q network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112348258A true CN112348258A (en) | 2021-02-09 |
CN112348258B CN112348258B (en) | 2022-09-20 |
Family
ID=74430080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011240256.1A Active CN112348258B (en) | 2020-11-09 | 2020-11-09 | Shared bicycle predictive scheduling method based on deep Q network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112348258B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113011790A (en) * | 2021-04-23 | 2021-06-22 | 上海汽车集团股份有限公司 | Shared automobile scheduling simulation method and device, electronic equipment and storage medium |
CN113096426A (en) * | 2021-03-29 | 2021-07-09 | 紫清智行科技(北京)有限公司 | Dispatching scheduling method for shared automatic driving vehicle |
CN113326993A (en) * | 2021-04-20 | 2021-08-31 | 西南财经大学 | Shared bicycle scheduling method based on deep reinforcement learning |
CN113743797A (en) * | 2021-09-08 | 2021-12-03 | 北京化工大学 | Pile-free shared bicycle operation scheduling strategy optimization method and system based on big data |
CN114298462A (en) * | 2021-11-16 | 2022-04-08 | 武汉小安科技有限公司 | Vehicle scheduling method and device, electronic equipment and storage medium |
CN117217499A (en) * | 2023-11-07 | 2023-12-12 | 南京职豆豆智能科技有限公司 | Campus electric scooter dispatching optimization method based on multi-source data driving |
CN118365501A (en) * | 2024-06-17 | 2024-07-19 | 中南大学 | Shared bicycle rebalancing method guided by balancing area division and rewarding mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146013A (en) * | 2017-04-28 | 2017-09-08 | 国网北京市电力公司 | A kind of classifying type electric automobile demand spatial and temporal distributions dynamic prediction method based on gray prediction and SVMs |
WO2019050908A1 (en) * | 2017-09-08 | 2019-03-14 | Didi Research America, Llc | System and method for ride order dispatching |
CN110525428A (en) * | 2019-08-29 | 2019-12-03 | 合肥工业大学 | A kind of automatic parking method based on the study of fuzzy deeply |
CN110766280A (en) * | 2019-09-20 | 2020-02-07 | 南京领行科技股份有限公司 | Vehicle scheduling method and generation method and device of target order prediction model |
CN111461500A (en) * | 2020-03-12 | 2020-07-28 | 北京航空航天大学 | Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning |
-
2020
- 2020-11-09 CN CN202011240256.1A patent/CN112348258B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107146013A (en) * | 2017-04-28 | 2017-09-08 | 国网北京市电力公司 | A kind of classifying type electric automobile demand spatial and temporal distributions dynamic prediction method based on gray prediction and SVMs |
WO2019050908A1 (en) * | 2017-09-08 | 2019-03-14 | Didi Research America, Llc | System and method for ride order dispatching |
CN110525428A (en) * | 2019-08-29 | 2019-12-03 | 合肥工业大学 | A kind of automatic parking method based on the study of fuzzy deeply |
CN110766280A (en) * | 2019-09-20 | 2020-02-07 | 南京领行科技股份有限公司 | Vehicle scheduling method and generation method and device of target order prediction model |
CN111461500A (en) * | 2020-03-12 | 2020-07-28 | 北京航空航天大学 | Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning |
Non-Patent Citations (4)
Title |
---|
DIAN SHI: "Deep Q-Network Based Route Scheduling for Transportation Network Company Vehicles", 《2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM)》 * |
刘冠男等: "基于深度强化学习的救护车动态重定位调度研究", 《管理科学学报》 * |
杨军等: "基于BP神经网络算法的共享单车需求预测", 《西部交通科技》 * |
王宁等: "基于用户激励的共享电动汽车调度成本优化", 《同济大学学报(自然科学版)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113096426A (en) * | 2021-03-29 | 2021-07-09 | 紫清智行科技(北京)有限公司 | Dispatching scheduling method for shared automatic driving vehicle |
CN113096426B (en) * | 2021-03-29 | 2021-11-05 | 紫清智行科技(北京)有限公司 | Dispatching scheduling method for shared automatic driving vehicle |
CN113326993A (en) * | 2021-04-20 | 2021-08-31 | 西南财经大学 | Shared bicycle scheduling method based on deep reinforcement learning |
CN113326993B (en) * | 2021-04-20 | 2023-06-09 | 西南财经大学 | Shared bicycle scheduling method based on deep reinforcement learning |
CN113011790A (en) * | 2021-04-23 | 2021-06-22 | 上海汽车集团股份有限公司 | Shared automobile scheduling simulation method and device, electronic equipment and storage medium |
CN113011790B (en) * | 2021-04-23 | 2024-06-21 | 上海汽车集团股份有限公司 | Shared automobile dispatching simulation method and device, electronic equipment and storage medium |
CN113743797A (en) * | 2021-09-08 | 2021-12-03 | 北京化工大学 | Pile-free shared bicycle operation scheduling strategy optimization method and system based on big data |
CN113743797B (en) * | 2021-09-08 | 2023-08-04 | 北京化工大学 | Pile-free sharing bicycle operation scheduling strategy optimization method and system based on big data |
CN114298462A (en) * | 2021-11-16 | 2022-04-08 | 武汉小安科技有限公司 | Vehicle scheduling method and device, electronic equipment and storage medium |
CN117217499A (en) * | 2023-11-07 | 2023-12-12 | 南京职豆豆智能科技有限公司 | Campus electric scooter dispatching optimization method based on multi-source data driving |
CN117217499B (en) * | 2023-11-07 | 2024-02-06 | 南京职豆豆智能科技有限公司 | Campus electric scooter dispatching optimization method based on multi-source data driving |
CN118365501A (en) * | 2024-06-17 | 2024-07-19 | 中南大学 | Shared bicycle rebalancing method guided by balancing area division and rewarding mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN112348258B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112348258B (en) | Shared bicycle predictive scheduling method based on deep Q network | |
CN102044149B (en) | City bus operation coordinating method and device based on time variant passenger flows | |
WO2021248607A1 (en) | Deep reinforcement learning-based taxi dispatching method and system | |
CN110555990B (en) | Effective parking space-time resource prediction method based on LSTM neural network | |
CN109389244B (en) | GRU-based multi-factor perception short-term scenic spot visitor number prediction method | |
CN114723125B (en) | Inter-city vehicle order allocation method combining deep learning and multitask optimization | |
JP3379983B2 (en) | Artificial intelligence traffic modeling and prediction system | |
CN110458456B (en) | Demand response type public transportation system scheduling method and system based on artificial intelligence | |
JP4870863B2 (en) | Elevator group optimum management method and optimum management system | |
CN109508751B (en) | Deep neural network model modeling method for high-speed railway train late time prediction | |
CN112417753B (en) | Urban public transport resource-based joint scheduling method | |
CN113682908B (en) | Intelligent scheduling method based on deep learning | |
CN112766591A (en) | Shared bicycle scheduling method | |
CN114418606A (en) | Network taxi appointment order demand prediction method based on space-time convolutional network | |
CN117474295B (en) | Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method | |
CN116324838A (en) | System and method for scheduling shared rides through a taxi calling platform | |
Guo et al. | Rebalancing and charging scheduling with price incentives for car sharing systems | |
CN112613630B (en) | Short-term traffic demand prediction method integrating multi-scale space-time statistical information | |
CN113743671B (en) | High-speed rail express special train transportation network optimization method and system | |
CN113393111B (en) | Cross-border transportation double-side connection vehicle scheduling method based on variable neighborhood tabu search algorithm | |
Wang | Taxi scheduling research based on Q-learning | |
CN115103331B (en) | Method and related device for determining working efficiency of road side unit | |
Wan et al. | Dynamic Elevator Dispatching Via Deep Reinforcement Learning with Multi-Head Attention | |
CN117455084A (en) | Empty taxi driving route recommendation method based on cloud edge end cooperation | |
Seyedabrishami et al. | Short-term prediction of bus passenger demand, case study: Karimkhan bridge-Jomhoori square line |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |