CN112348258A - Shared bicycle predictive scheduling method based on deep Q network - Google Patents

Shared bicycle predictive scheduling method based on deep Q network Download PDF

Info

Publication number
CN112348258A
CN112348258A CN202011240256.1A CN202011240256A CN112348258A CN 112348258 A CN112348258 A CN 112348258A CN 202011240256 A CN202011240256 A CN 202011240256A CN 112348258 A CN112348258 A CN 112348258A
Authority
CN
China
Prior art keywords
vehicle
model
time
network model
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011240256.1A
Other languages
Chinese (zh)
Other versions
CN112348258B (en
Inventor
史明光
盛洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202011240256.1A priority Critical patent/CN112348258B/en
Publication of CN112348258A publication Critical patent/CN112348258A/en
Application granted granted Critical
Publication of CN112348258B publication Critical patent/CN112348258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Educational Administration (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a shared bicycle prediction scheduling method based on a deep Q network, which comprises the following steps: 1. designing a simulation environment for simulating actual dispatching of the shared bicycle; 2. acquiring user information and constructing a user behavior data matrix; 3. training a prediction network model consisting of a linear regression model and an SVM model; 4. training a prediction scheduling model based on a deep Q network by combining a prediction network model; and 5, carrying out real-time scheduling by using the trained model. The method can predict the vehicle demand of each vehicle area in the future time period in advance by combining the linear regression model, the SVM model and the deep reinforcement learning method of the deep Q network under the condition of lacking enough training data, so that the shared single vehicle in each vehicle area can be rapidly and reasonably scheduled in advance.

Description

Shared bicycle predictive scheduling method based on deep Q network
Technical Field
The invention belongs to the field of shared bicycle scheduling, and particularly relates to a shared bicycle prediction scheduling method based on a deep Q network.
Background
With the progress of society, sharing economy is more and more common around our body. The problem that people go out for the last kilometer is solved to a great extent due to the appearance of the shared bicycle, but the problem of unreasonable placement of the shared bicycle is troubling managers at all levels. The main problem lies in that the quantity of the bicycles in each area is not matched with the requirement of the bicycle, so that a large number of idle bicycles are accumulated in some areas, and the bicycles are not available in some areas. Therefore, how to reasonably allocate the shared resources in each area, so as to avoid the waste of resources, is a difficult problem which always troubles the companies providing the shared services.
The conventional shared bicycle scheduling is mostly based on the idea of average distribution of each area, and the requirement difference of the bicycles in different types of areas and different time periods is rarely considered, so that the situation that the shared bicycles in partial areas are stacked and the bicycles in some areas are not available is caused. The scheduling algorithm is generally applied to various fields in life, such as an elevator scheduling algorithm, a time slice-based round robin scheduling algorithm and the like, and with the vigorous development of big data and artificial intelligence technology, the traditional single scheduling algorithm cannot meet the current requirements.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a shared bicycle forecasting and scheduling method based on a deep Q network, so that the vehicle demand of each vehicle area in the future time period can be forecasted in advance, the scheduling of the shared bicycle is more reasonable and efficient, and the sharing advantage is fully exerted.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a shared bicycle prediction scheduling method based on a deep Q network, which is characterized by comprising the following steps of:
step 1: establishing a vehicle area model and an unmanned dispatching transport vehicle operation environment model;
step 2: collecting days of all users in the vehicle area kUsual behavior data and the number h of single vehicles in the vehicle area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and
Figure BDA0002768201430000011
Figure BDA0002768201430000012
a one-hot encoded form vector converted from weather information of the jth record generated by the user i,
Figure BDA0002768201430000013
time information representing the j-th record generated by user i,
Figure BDA0002768201430000014
the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,
Figure BDA0002768201430000015
representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model and using the historical vehicle-using time t of the vehicle-using region kkAnd the corresponding vehicle number is used as an input variable, the hyper-parameters of the linear regression model are optimized until the linear regression model converges, and the vehicle number n 'of the prediction model for predicting the future time period of the vehicle using area k is obtained'k
Step 3.2: constructing an SVM model, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of the user i, if the classification result is 1, the user i needs to use the vehicle,otherwise, the user i does not need to use the vehicle, and the predicted vehicle utilization amount n' of the future time period of the vehicle utilization area k is calculated according to the classification resultk
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' of the future time periodkObtaining the prediction result of the prediction network model by weighted calculation, namely the predicted vehicle consumption n of the future time period of the vehicle area kk
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηtt},ηtIndicating the direction information of the vehicle at time t, ktShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρttt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatPosition information, mu, representing a scheduled vehicle at time ttPosition information indicating each user at time t;
step 6: the reward function R is set using equation (1):
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
Figure BDA0002768201430000021
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
Figure BDA0002768201430000031
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
Figure BDA0002768201430000032
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents non-parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: an input layer, one comprising m1A hidden layer of layers, an FC layer and an output layer, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0
Step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and initializing network parameters in the prediction target network model into the prediction target network model in a Gaussian initialization mode
Figure BDA0002768201430000033
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
Figure BDA0002768201430000034
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
Figure BDA0002768201430000035
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the prediction evaluation network model by using the formula (7):
Figure BDA0002768201430000041
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) An estimate, Q(s), representing the true cumulative return of the predicted target network model at time tt,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value;
step 12: and utilizing the optimal prediction target network model to realize real-time scheduling of the number of the single vehicles in each vehicle area.
Compared with the prior art, the invention has the beneficial effects that:
1. the shared bicycle predictive scheduling method based on the deep Q network overcomes the problem of hysteresis of the traditional scheduling algorithm in the process of processing the problems by a mode of combining prediction and scheduling, thereby greatly improving the utilization rate of the shared bicycle;
2. the prediction network model can sense the vehicle using requirements of the user in advance by combining the advantages of the linear regression model and the SVM model, so that the vehicle can be scheduled in place before the user actually uses the vehicle, and the waiting time of the user is reduced;
3. the reinforcement learning method combined with the prediction model can optimize the hyper-parameters by using the learning mode of experience playback under the condition of insufficient training data, thereby greatly reducing the training cost, improving the efficiency of the model, greatly improving the scheduling efficiency and scheduling timeliness of the shared bicycle, and reducing the scheduling cost of the shared bicycle.
Drawings
FIG. 1 is a diagram of a vehicle region model and an operating environment model of an unmanned dispatching transportation vehicle according to the present invention;
FIG. 2 is a graph of the average reward variation of the optimized optimal predicted objective network model according to the present invention;
fig. 3 is a flowchart of a method for predicting and scheduling a shared single vehicle based on a deep Q network according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 3, a shared bicycle prediction scheduling method based on a deep Q network predicts the vehicle demand of each vehicle area in a future time period in advance by combining a linear regression model, an SVM model and a deep reinforcement learning method of the deep Q network under the condition of lack of sufficient training data, and specifically includes the following steps:
step 1: configuring a simulation environment by using a Tkint tool in a Python GUI library, and establishing a vehicle area model and an unmanned dispatching transport vehicle running environment model, wherein the vehicle area model consists of the following parts: the urban environment is simulated by adopting 5 multiplied by 5 grids, A-F respectively represent six different types of vehicle areas of schools, parks, stadiums, pedestrian streets, office buildings and subway stations, parking areas and parking upper limits of different parking areas are defined in each vehicle area, the parking areas are represented by grey, and numbers are used for distinguishing, so that the vehicle demand difference of different types of areas is simulated, the total number of the vehicles is assumed to be 100, 20, 10, 20 and 20 vehicles are respectively distributed to the six areas during initialization, and the maximum vehicle capacity of the areas is 30, 20, 15, 25, 30 and 50; the unmanned dispatching transport vehicle operation environment model comprises: the position of the unmanned dispatching transport vehicle is represented by a black short solid line, the actions of the unmanned dispatching transport vehicle comprise ascending, descending, left-going, right-going, single vehicle putting down and single vehicle taking away, 10 single vehicles are distributed to the unmanned dispatching transport vehicle during initialization, the dotted line simulates an urban road, a blank area represents that the unmanned dispatching transport vehicle is forbidden to pass through other areas, the black solid lines on the periphery represent a vehicle boundary, and the dispatching vehicle cannot cross the boundary, as shown in figure 1;
step 2: collecting daily behavior data of all users in the vehicle using area k and the number h of single vehicles in the vehicle using area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and
Figure BDA0002768201430000051
Figure BDA0002768201430000052
a one-hot encoded form vector converted from weather information of the jth record generated by the user i,
Figure BDA0002768201430000053
time information representing the j-th record generated by user i,
Figure BDA0002768201430000054
the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,
Figure BDA0002768201430000055
representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model f (t) ═ at2+ bt + c, a, b, c represent three hyper-parameters to be adjusted during training, and the historical vehicle using time t of the vehicle using region kkAnd the corresponding number f (t) of carsk) Optimizing hyper-parameters of the linear regression model for input variables until the linear regression model converges, thereby obtaining a forecasting model for forecasting the vehicle using number n 'of the future time period of the vehicle using area k'k
Step 3.2: construction of the SVM model δi=sign(ω*dij+d*),ω*、d*Representing the hyper-parameters needing to be adjusted, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of a user i, if the classification result is 1, the user i needs to use the vehicle, otherwise, the user i does not need to use the vehicle, and calculating the predicted vehicle using amount n' of the vehicle using region k in the future time period according to the classification resultk
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' for the future time periodkThrough nk=0.4n′k+0.6n″kWeighted male ofCalculating to obtain the prediction result of the prediction network model, namely the predicted vehicle consumption n of the future time period of the vehicle-using region kk
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηtt},ηtThe direction information of the vehicle at time t includes upward driving, downward driving, leftward driving, rightward driving, and κtShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρttt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatThe position information of the dispatching transport vehicle at the time t comprises coordinates corresponding to a horizontal axis and a vertical axis, mutThe position information of each user at the moment t is shown, namely the current area of each user and whether the vehicle is used or not;
step 6: and (2) setting a reward function R by using the formula (1), wherein when the unmanned dispatching transport vehicle runs in the network, the running environment gives corresponding rewards according to the reward function in combination with the action taken by the unmanned dispatching transport vehicle and the state change of the environment:
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
Figure BDA0002768201430000061
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
Figure BDA0002768201430000062
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
Figure BDA0002768201430000071
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents no parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, a maximum iteration number as A, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: a 3-layer input layer comprising 13 neurons, a layer comprising m1M of neuron2A hidden layer of layers, an FC layer and an output layer containing 7 neurons, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0During training, firstly converting time sequence data corresponding to the environment state information into a tensor, then inputting the obtained tensor into a model for training, and outputting the action information of the unmanned dispatching transport vehicle;
step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and adopting a Gaussian initialization mode to predict a target networkInitialization of network parameters in a network model
Figure BDA0002768201430000072
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
Figure BDA0002768201430000073
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
Figure BDA0002768201430000074
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the evaluation network model by using the formula (7):
Figure BDA0002768201430000081
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) Estimate representing the true cumulative return of the predicted target network model at time t,Q(st,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true or not, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value, an average reward change curve of the obtained optimal prediction target network model after 100 times of iterative tests is shown in figure 2, a horizontal axis represents iteration times, a vertical axis represents accumulated rewards obtained corresponding to a training period, and finally the average reward is stabilized at about-100;
step 12: and (4) realizing real-time scheduling of the number of the single vehicles in each vehicle area by utilizing the optimal prediction target network model.

Claims (1)

1. A shared bicycle predictive scheduling method based on a deep Q network is characterized by comprising the following steps:
step 1: establishing a vehicle area model and an unmanned dispatching transport vehicle operation environment model;
step 2: collecting daily behavior data of all users in the vehicle using area k and the number h of single vehicles in the vehicle using area kkHistorical time t of using car in the shared car area kkCorresponding number of cars using, wherein the daily behavior data of the jth record generated by the user i in the car using area k is recorded as dijAnd is and
Figure FDA0002768201420000011
Figure FDA0002768201420000012
a one-hot encoded form vector converted from weather information of the jth record generated by the user i,
Figure FDA0002768201420000013
to representThe time information of the j-th record generated by the user i,
Figure FDA0002768201420000014
the riding road information of the j-th record generated by the user i comprises a starting point, an end point and a route selection,
Figure FDA0002768201420000015
representing the j-th recorded car using information generated by the user i, and obtaining a user behavior data matrix as D ═ (D)ij)M×N
And step 3: training a prediction network model consisting of a linear regression model and an SVM model:
step 3.1: constructing a linear regression model and using the historical vehicle-using time t of the vehicle-using region kkAnd the corresponding vehicle number is used as an input variable, the hyper-parameters of the linear regression model are optimized until the linear regression model converges, and the vehicle number n 'of the prediction model for predicting the future time period of the vehicle using area k is obtained'k
Step 3.2: constructing an SVM model, training the SVM model by taking the user behavior data matrix D as an input variable to obtain a trained SVM model for obtaining a classification result, wherein the classification result represents the vehicle using requirement of a user i, if the classification result is 1, the user i needs to use the vehicle, otherwise, the user i does not need to use the vehicle, and calculating the predicted vehicle using amount n' of a vehicle using region k in the future time period according to the classification resultk
Step 3.3: the vehicle using number n 'of future time period of the vehicle using area k'kAnd the predicted vehicle consumption n' of the future time periodkObtaining the prediction result of the prediction network model by weighted calculation, namely the predicted vehicle consumption n of the future time period of the vehicle area kk
And 4, step 4: repeating the step 2 and the step 3 until the estimated vehicle using number of all vehicle using areas is calculated;
and 5: define action command set a ═ { a ═ a1,…,at,…am},atRepresents the action information of the unmanned dispatching transport vehicle at the time t, and at={ηtt},ηtIndicating the direction information of the vehicle at time t, ktShowing whether the dispatching transport vehicle is folded or put down the single vehicle at the time t; define state instruction set s ═ s0,…,st,…sm},stRepresents the operating environment state information of the unmanned dispatching transport vehicle at the time t, and st={ρttt},ρtInformation indicating the number of vehicles per vehicle zone at time t, i.e. rhot=(nt1,…,ntk,…,ntX) Wherein n istkIndicating the predicted vehicle using amount of a t-time area k, X indicating the total vehicle using area number, iotatPosition information, mu, representing a scheduled vehicle at time ttPosition information indicating each user at time t;
step 6: the reward function R is set using equation (1):
R=Rpr+Ra+Rn (1)
in the formula (1), RprA reward function representing a predictive network model and having:
Figure FDA0002768201420000021
in the formula (2), ζ represents a reward penalty coefficient, and ζ is formed by (0, 1);
in the formula (1), RaRepresents a fixed action reward for an unmanned vehicle and has:
Figure FDA0002768201420000022
in the formula (3), e is a constant;
in the formula (1), RnRepresents a single-vehicle dispatch reward function and has:
Figure FDA0002768201420000023
in the equation (4), Δ represents whether the unmanned dispatching vehicle is parked in the designated vehicle parking area, Δ ═ 1 represents parking in the designated area, Δ ═ 0 represents non-parking in the designated area, and hkIndicating the number of existing vehicles in the vehicle area k, ckRepresenting the number of the single vehicles laid down or taken away by the unmanned dispatching transport vehicle in the vehicle using area k, b is a constant, r represents another reward punishment coefficient, and r is equal to (0, 1);
and 7: setting a learning rate as alpha, a reward attenuation coefficient as gamma, an updating frequency as T, and initializing T as 1;
and 8: constructing a prediction scheduling model based on a deep Q network:
step 8.1: constructing a prediction evaluation network model, comprising: an input layer, one comprising m1A hidden layer of layers, an FC layer and an output layer, and initializing the network parameters in the prediction evaluation network model to theta by adopting a Gaussian initialization mode0
Step 8.2: constructing a prediction target network model with the same structure as the prediction evaluation network model, and initializing network parameters in the prediction target network model into the prediction target network model in a Gaussian initialization mode
Figure FDA0002768201420000024
And step 9: optimizing, predicting and evaluating parameters of the network model:
step 9.1: calculating an auxiliary adjustment coefficient sigma of the prediction evaluation network model by using the formula (1):
Figure FDA0002768201420000031
in the formula (5), β represents an error adjustment coefficient, maxk(hk+ck-nk) The maximum error of the predicted vehicle consumption in all the vehicle consumption areas is represented;
step 9.2: the cost function Ψ is calculated using equation (6):
Figure FDA0002768201420000032
in the formula (6), m represents the total number of states in the model of the operating environment of the unmanned dispatching transport vehicle, Q(s)t,at) Representing the true cumulative reward at time t, Q(s)t,at;θt) Represents the cumulative return of the predicted evaluation network model estimate at time t, θtA network parameter representing time t;
step 9.3: updating the network parameters of the prediction evaluation network model by using the formula (7):
Figure FDA0002768201420000033
in the formula (7), θtNetwork parameter, theta, of a predictive evaluation network model representing time tt *Network parameters, R, representing a predicted target network model at time ttThe value of the reward function, Q(s), at time tt+1,at+1;θt *) An estimate, Q(s), representing the true cumulative return of the predicted target network model at time tt,at;θt) Representing the accumulated return of the prediction evaluation network model estimation at the time t;
step 10: according to the updating frequency T, the network parameter theta of the predicted target network model at the updating moment is updated*Updating the parameter theta of the prediction evaluation network model at the corresponding moment;
step 11: assigning t +1 to t, judging whether t > A is true, if so, indicating that an optimal prediction target network model is obtained, otherwise, returning to the step 9.2 for sequential execution, wherein A is a set threshold value;
step 12: and utilizing the optimal prediction target network model to realize real-time scheduling of the number of the single vehicles in each vehicle area.
CN202011240256.1A 2020-11-09 2020-11-09 Shared bicycle predictive scheduling method based on deep Q network Active CN112348258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011240256.1A CN112348258B (en) 2020-11-09 2020-11-09 Shared bicycle predictive scheduling method based on deep Q network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011240256.1A CN112348258B (en) 2020-11-09 2020-11-09 Shared bicycle predictive scheduling method based on deep Q network

Publications (2)

Publication Number Publication Date
CN112348258A true CN112348258A (en) 2021-02-09
CN112348258B CN112348258B (en) 2022-09-20

Family

ID=74430080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011240256.1A Active CN112348258B (en) 2020-11-09 2020-11-09 Shared bicycle predictive scheduling method based on deep Q network

Country Status (1)

Country Link
CN (1) CN112348258B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011790A (en) * 2021-04-23 2021-06-22 上海汽车集团股份有限公司 Shared automobile scheduling simulation method and device, electronic equipment and storage medium
CN113096426A (en) * 2021-03-29 2021-07-09 紫清智行科技(北京)有限公司 Dispatching scheduling method for shared automatic driving vehicle
CN113326993A (en) * 2021-04-20 2021-08-31 西南财经大学 Shared bicycle scheduling method based on deep reinforcement learning
CN113743797A (en) * 2021-09-08 2021-12-03 北京化工大学 Pile-free shared bicycle operation scheduling strategy optimization method and system based on big data
CN114298462A (en) * 2021-11-16 2022-04-08 武汉小安科技有限公司 Vehicle scheduling method and device, electronic equipment and storage medium
CN117217499A (en) * 2023-11-07 2023-12-12 南京职豆豆智能科技有限公司 Campus electric scooter dispatching optimization method based on multi-source data driving
CN118365501A (en) * 2024-06-17 2024-07-19 中南大学 Shared bicycle rebalancing method guided by balancing area division and rewarding mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146013A (en) * 2017-04-28 2017-09-08 国网北京市电力公司 A kind of classifying type electric automobile demand spatial and temporal distributions dynamic prediction method based on gray prediction and SVMs
WO2019050908A1 (en) * 2017-09-08 2019-03-14 Didi Research America, Llc System and method for ride order dispatching
CN110525428A (en) * 2019-08-29 2019-12-03 合肥工业大学 A kind of automatic parking method based on the study of fuzzy deeply
CN110766280A (en) * 2019-09-20 2020-02-07 南京领行科技股份有限公司 Vehicle scheduling method and generation method and device of target order prediction model
CN111461500A (en) * 2020-03-12 2020-07-28 北京航空航天大学 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146013A (en) * 2017-04-28 2017-09-08 国网北京市电力公司 A kind of classifying type electric automobile demand spatial and temporal distributions dynamic prediction method based on gray prediction and SVMs
WO2019050908A1 (en) * 2017-09-08 2019-03-14 Didi Research America, Llc System and method for ride order dispatching
CN110525428A (en) * 2019-08-29 2019-12-03 合肥工业大学 A kind of automatic parking method based on the study of fuzzy deeply
CN110766280A (en) * 2019-09-20 2020-02-07 南京领行科技股份有限公司 Vehicle scheduling method and generation method and device of target order prediction model
CN111461500A (en) * 2020-03-12 2020-07-28 北京航空航天大学 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DIAN SHI: "Deep Q-Network Based Route Scheduling for Transportation Network Company Vehicles", 《2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM)》 *
刘冠男等: "基于深度强化学习的救护车动态重定位调度研究", 《管理科学学报》 *
杨军等: "基于BP神经网络算法的共享单车需求预测", 《西部交通科技》 *
王宁等: "基于用户激励的共享电动汽车调度成本优化", 《同济大学学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096426A (en) * 2021-03-29 2021-07-09 紫清智行科技(北京)有限公司 Dispatching scheduling method for shared automatic driving vehicle
CN113096426B (en) * 2021-03-29 2021-11-05 紫清智行科技(北京)有限公司 Dispatching scheduling method for shared automatic driving vehicle
CN113326993A (en) * 2021-04-20 2021-08-31 西南财经大学 Shared bicycle scheduling method based on deep reinforcement learning
CN113326993B (en) * 2021-04-20 2023-06-09 西南财经大学 Shared bicycle scheduling method based on deep reinforcement learning
CN113011790A (en) * 2021-04-23 2021-06-22 上海汽车集团股份有限公司 Shared automobile scheduling simulation method and device, electronic equipment and storage medium
CN113011790B (en) * 2021-04-23 2024-06-21 上海汽车集团股份有限公司 Shared automobile dispatching simulation method and device, electronic equipment and storage medium
CN113743797A (en) * 2021-09-08 2021-12-03 北京化工大学 Pile-free shared bicycle operation scheduling strategy optimization method and system based on big data
CN113743797B (en) * 2021-09-08 2023-08-04 北京化工大学 Pile-free sharing bicycle operation scheduling strategy optimization method and system based on big data
CN114298462A (en) * 2021-11-16 2022-04-08 武汉小安科技有限公司 Vehicle scheduling method and device, electronic equipment and storage medium
CN117217499A (en) * 2023-11-07 2023-12-12 南京职豆豆智能科技有限公司 Campus electric scooter dispatching optimization method based on multi-source data driving
CN117217499B (en) * 2023-11-07 2024-02-06 南京职豆豆智能科技有限公司 Campus electric scooter dispatching optimization method based on multi-source data driving
CN118365501A (en) * 2024-06-17 2024-07-19 中南大学 Shared bicycle rebalancing method guided by balancing area division and rewarding mechanism

Also Published As

Publication number Publication date
CN112348258B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN112348258B (en) Shared bicycle predictive scheduling method based on deep Q network
CN102044149B (en) City bus operation coordinating method and device based on time variant passenger flows
WO2021248607A1 (en) Deep reinforcement learning-based taxi dispatching method and system
CN110555990B (en) Effective parking space-time resource prediction method based on LSTM neural network
CN109389244B (en) GRU-based multi-factor perception short-term scenic spot visitor number prediction method
CN114723125B (en) Inter-city vehicle order allocation method combining deep learning and multitask optimization
JP3379983B2 (en) Artificial intelligence traffic modeling and prediction system
CN110458456B (en) Demand response type public transportation system scheduling method and system based on artificial intelligence
JP4870863B2 (en) Elevator group optimum management method and optimum management system
CN109508751B (en) Deep neural network model modeling method for high-speed railway train late time prediction
CN112417753B (en) Urban public transport resource-based joint scheduling method
CN113682908B (en) Intelligent scheduling method based on deep learning
CN112766591A (en) Shared bicycle scheduling method
CN114418606A (en) Network taxi appointment order demand prediction method based on space-time convolutional network
CN117474295B (en) Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method
CN116324838A (en) System and method for scheduling shared rides through a taxi calling platform
Guo et al. Rebalancing and charging scheduling with price incentives for car sharing systems
CN112613630B (en) Short-term traffic demand prediction method integrating multi-scale space-time statistical information
CN113743671B (en) High-speed rail express special train transportation network optimization method and system
CN113393111B (en) Cross-border transportation double-side connection vehicle scheduling method based on variable neighborhood tabu search algorithm
Wang Taxi scheduling research based on Q-learning
CN115103331B (en) Method and related device for determining working efficiency of road side unit
Wan et al. Dynamic Elevator Dispatching Via Deep Reinforcement Learning with Multi-Head Attention
CN117455084A (en) Empty taxi driving route recommendation method based on cloud edge end cooperation
Seyedabrishami et al. Short-term prediction of bus passenger demand, case study: Karimkhan bridge-Jomhoori square line

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant