CN110619442A - Vehicle berth prediction method based on reinforcement learning - Google Patents

Vehicle berth prediction method based on reinforcement learning Download PDF

Info

Publication number
CN110619442A
CN110619442A CN201910916466.9A CN201910916466A CN110619442A CN 110619442 A CN110619442 A CN 110619442A CN 201910916466 A CN201910916466 A CN 201910916466A CN 110619442 A CN110619442 A CN 110619442A
Authority
CN
China
Prior art keywords
data
parking lot
network
prediction
berth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910916466.9A
Other languages
Chinese (zh)
Inventor
岑跃峰
张晨光
岑岗
马伟锋
程志刚
徐昶
周闻
王佳晨
蔡永平
张宇来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lover Health Science and Technology Development Co Ltd
Zhejiang University of Science and Technology ZUST
Original Assignee
Zhejiang University of Science and Technology ZUST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Science and Technology ZUST filed Critical Zhejiang University of Science and Technology ZUST
Priority to CN201910916466.9A priority Critical patent/CN110619442A/en
Publication of CN110619442A publication Critical patent/CN110619442A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/14Traffic control systems for road vehicles indicating individual free spaces in parking areas
    • G08G1/145Traffic control systems for road vehicles indicating individual free spaces in parking areas where the indication depends on the parking areas
    • G08G1/148Management of a network of parking areas

Abstract

The invention discloses a vehicle berthage prediction method based on reinforcement learning, which comprises the following steps: a. acquiring the historical parking space number of a target parking lot, carrying out normalization processing on the historical parking space number to form a data set, taking 60-90% of data in the data set as a training set, and taking 10-40% of data in the data set as a test set; b. building a parking lot berth prediction model based on a reinforcement learning theory, and inputting a training set into the parking lot berth prediction model for training; c. and predicting the vehicle berth of the target parking lot by using the trained parking lot berth prediction model so as to verify the prediction accuracy of the parking lot berth prediction model by using a test set. The invention can achieve the purpose of predicting the parking position condition of the parking lot, and the prediction accuracy of the parking position condition is high.

Description

Vehicle berth prediction method based on reinforcement learning
Technical Field
The invention relates to the field of neural networks and reinforcement learning, in particular to a vehicle berthage prediction method based on reinforcement learning.
Background
With the general improvement of the consumption level of people in China, the number of motor vehicles kept by urban and rural residents in China is remarkably increased, the parking problem is gradually highlighted in the daily life and work of people, and is particularly obvious in the central area of a city. In order to relieve the contradiction between supply and demand of motor vehicles at berth, a plurality of motor vehicle berth prediction methods applying artificial intelligence analyze the problem, and further judge the supply and demand conditions of the berth near the motor vehicles by means of accurate berth prediction, thereby providing reliable berth demand for drivers and relieving the problem of parking in cities and countryside.
The traditional parking space prediction method is mainly a single network prediction technology or a BP neural network prediction method, can effectively predict parking spaces in a short time by processing parking space data of a parking lot, but has insufficient data volatility processing capacity, and the obtained prediction data has poor stability. For example, the invention with the publication number of CN108648449A discloses a parking space prediction method based on a combination of kalman filtering and NAR neural network, according to the predicted data, the accuracy of two prediction models of kalman filtering and NAR neural network is compared, and the times with high accuracy are used as the weight for performing combined prediction of the two models, so as to obtain the predicted value of the combined model. However, in many parking lots, it is difficult to provide a large amount of parking position data for the combined model to perform weight selection for model fusion, so that the practical value of the model is greatly reduced, and the method is not suitable for the early stage of parking position prediction.
Disclosure of Invention
The invention aims to provide a vehicle berthage prediction method based on reinforcement learning. The invention can achieve the purpose of predicting the parking position condition of the parking lot, and the prediction accuracy of the parking position condition is high.
The technical scheme of the invention is as follows: a vehicle berthage prediction method based on reinforcement learning is carried out according to the following steps:
a. acquiring the historical parking space number of a target parking lot, carrying out normalization processing on the historical parking space number to form a data set, taking 60-90% of data in the data set as a training set, and taking 10-40% of data in the data set as a test set;
b. building a parking lot berth prediction model based on a reinforcement learning theory, and inputting a training set into the parking lot berth prediction model for training;
c. and predicting the vehicle berth of the target parking lot by using the trained parking lot berth prediction model so as to verify the prediction accuracy of the parking lot berth prediction model by using a test set.
In the vehicle parking space prediction method based on reinforcement learning, the historical parking space number in step a is the parking space number of the target parking space in a certain time period, the historical parking space number is normalized and compressed into a real number between 0 and 1, and the formula of the normalization process is as follows:
wherein d is the original data before normalization processing, d*To normalize the processed data, dmaxIs the maximum value of data in each field of the original data, dminThe minimum value of data in each field of the original data.
In the vehicle berthage prediction method based on reinforcement learning, the parking lot berthage prediction model adopts a PPO algorithm based on an Actor-Critic framework; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;
the calculation process of the criticic network is as follows:
hc,1=relu(xj*wc,1+bc,1)
hc,2=relu(hc,1*wc,2+bc,2)
hc,5=relu(hc,4*wc,5+bc,5)
L_criticj=relu(hc,5*wc,out+bc,out)
the calculation process of the Actor network is as follows:
ha,1=relu(xj*wa,1+ba,1)
ha,2=relu(ha,1*wa,2+ba,2)
ha,5=relu(ha,4*wa,5+ba,5)
L_actorj=relu(ha,5*wa,out+ba,out)
wherein x isjFor the jth data of the input sequence, wc,iAnd bc,iRespectively Critic networkWeights and offsets of the complex, i ═ 1,2c,outAnd bc,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ CriticjAs a function of the value for deciding whether the Actor network exercises well or not, wa,iAnd ba,iAre the weights and offsets, i 1,2, 5, w, respectively, of the Actor networka,outAnd ba,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;
select relu as the activation function, L _ actorjFor model prediction output, rt(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, and the average value of epsilon is 0.2, LCLIP(theta) represents the probability ratio r by truncating the probability ratiot(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.
In the vehicle berthage prediction method based on reinforcement learning, when loss values of respective loss functions are calculated by the Actor network and the Critic network, the loss values of the Actor network and the Critic network are optimized by using a Nadam random gradient descent algorithm;
the Nadam random gradient descent algorithm is an Adam optimization algorithm with a Nesterov momentum term, and the calculation process is as follows:
wherein, gtIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to gtThe amount of correction of (a) is,is momentum term m at time ttAverage value of (d); ξ is a positive number close to 0 but not 0, η is the Nadam algorithm learning rate, Δ θtI.e. the updated gradient value, uiThe momentum factor estimated for the moment of the first order at time i, i 1, 2.
In the vehicle berth prediction method based on reinforcement learning, the data set is represented as:
S={xi|i=1,2,...,n}
wherein x isiRepresenting the ith data in the data set, n representing the data length of the data set, and S being a parking space historical number set;
performing data preprocessing on a parking lot historical number set S of a target parking lot, dividing the parking lot historical number set S into a training set and a testing set, taking batch _ size as an integer as an input batch size of a parking lot berth prediction model, taking batch _ size as a fixed value of the parking lot berth prediction model, and circularly intercepting data in a data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.
In the step c, after the vehicle parking position of the target parking lot is predicted by using the trained parking lot parking position prediction model, the vehicle parking position prediction method performs inverse normalization processing on the prediction output value to be used as accurate parking lot parking position prediction data, and the inverse normalization method is as follows:
z*=z*(zmax-zmin)+zmin
wherein z is the predicted output value of the parking lot berth prediction model, zmaxIs the maximum value in the set of parking space history numbers, zminIs the minimum value in the set of parking space history numbers, z*And the target parking lot berth prediction data is obtained after inverse normalization processing.
Compared with the prior art, the invention provides a vehicle parking position prediction method based on reinforcement learning, which is characterized in that a corresponding parking position prediction model is constructed, and the acquired historical number of parking positions of a target parking lot is subjected to normalization processing to form a data set so as to reduce noise influence in the subsequent training process; and performing autonomous training and learning on the parking lot berth prediction model by using a training set divided by the data set, predicting the berthing condition of a target parking lot by using the trained parking lot berth prediction model, and finally verifying the prediction accuracy of the parking lot berth prediction model by using the test set. Therefore, the prediction accuracy and robustness of the parking lot berth prediction model can be greatly improved; the invention has strong practicability and stable and accurate prediction data, and can fully provide the berth supply and demand condition of the target parking lot for users. In addition, the Actor-Critic reinforcement learning framework is used for vehicle berthage prediction, wherein the Actor network is used for model training set prediction, the Critic network evaluates the training target of the Actor network in the model training process and gives the advantage of the Actor network after strategy updating, and therefore the effectiveness and the accuracy of parking lot berthage prediction model training are effectively improved. The invention also introduces the target cutting idea into the model training process, limits the large-scale updating of the strategy target, reduces the weight selection of the model, simplifies the integral structure of the model and has better convenience. Furthermore, the method adopts the Nadam algorithm to carry out gradient optimization, can have stronger constraint on the learning rate in the training process, has more direct influence on the updating of the gradient, and ensures the stable updating process of the gradient.
Drawings
FIG. 1 is a flow chart of a prediction method of the present invention;
FIG. 2 is a block diagram of the system of the present invention;
FIG. 3 is a prediction graph of example 1 of the present invention, with a sequence length of 5 days;
FIG. 4 is a prediction graph of example 2 of the present invention, with a sequence length of 5 days.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not to be construed as limiting the invention.
Example 1: a vehicle berth prediction method based on reinforcement learning is shown in figure 1 and is carried out according to the following steps:
s1, obtaining the historical parking space number of the target parking lot, crawling the historical parking space data of the park from 03 at 11 th at 4 and 22 th at 2019 to 15 at 6 and 3 th at 2019 and 15 at 01 th at the mountain creating road in the Hangzhou city in Zhejiang by a web crawler, wherein 12583 time points are included by taking seconds as a unit, the parking space number of each time point is used as the historical parking space data, 80% of the historical data is used as a training set, and 20% of the historical data is used as a testing set;
s2, carrying out normalization processing on the training set and the test set to form a data set, and compressing the data set into a real number between 0 and 1, wherein the formula of the normalization processing is as follows:
wherein d is the original data before normalization processing, d*To normalize the processed data, dmaxIs the maximum value of data in each field of the original data, dminThe minimum value of data in each field of the original data is obtained;
s3, further preprocessing the data set, the data set being represented as:
S={xi|i=1,2,...,n}
wherein x isiRepresenting the ith data in the data set, n representing the data length of the data set, S being the parking lotA set of bit history numbers;
taking batch _ size as an integer as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, and circularly intercepting data in the data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.
Constructing a parking lot berth prediction model based on a reinforcement learning theory, wherein the parking lot berth prediction model adopts a PPO algorithm based on an Actor-Critic framework as shown in FIG. 2; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;
the calculation process of the criticic network is as follows:
hc,1=relu(xj*wc,1+bc,1)
hc,2=relu(hc,1*wc,2+bc,2)
hc,5=relu(hc,4*wc,5+bc,5)
L_criticj=relu(hc,5*wc,out+bc,out)
the calculation process of the Actor network is as follows:
ha,1=relu(xj*wa,1+ba,1)
ha,2=relu(ha,1*wa,2+ba,2)
ha,5=relu(ha,4*wa,5+ba,5)
L_actorj=relu(ha,5*wa,out+ba,out)
wherein x isjFor the jth data of the input sequence, wc,iAnd bc,i(i 1, 2.., 5) are the weight and bias, respectively, of the Critic network, wc,outAnd bc,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ CriticjAs a function of the value for deciding whether the Actor network exercises well or not, wa,iAnd ba,i(i 1, 2.., 5) are the weights and biases, respectively, of the Actor network, wa,outAnd ba,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;
select relu as the activation function, L _ actorjFor model prediction output, rt(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, the average value of epsilon is 0.2, LCLIP(theta) represents the probability ratio r by truncating the probability ratiot(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.
When the loss values of the loss functions of the Actor network and the Critic network are calculated, the loss values of the Actor network and the Critic network are optimized by using a Nadam random gradient descent algorithm;
the Nadam random gradient descent algorithm is an Adam optimization algorithm with a Nesterov momentum term, and the calculation process is as follows:
wherein, gtIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to gtThe amount of correction of (a) is,is mtIs a positive number close to 0 but not 0, eta is the learning rate of the Nadam algorithm, Delta thetatI.e. the updated gradient value, ui(i 1, 2.. t, t +1) is the momentum factor estimated for the moment of unity at time i.
Taking batch _ size as 5 as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, circularly intercepting data in a data set as an input sequence for predicting parking lot berth data at the next moment of the sequence, and inputting a training set into the parking lot berth prediction model for training; wherein the learning rates of the Actor and the Critic networks are respectively selected to be 0.0002 and 0.0001, and epsilon is 0.2;
s4, vehicle berth prediction of a target parking lot is carried out by using the trained parking lot berth prediction model, and anti-normalization processing is carried out through a prediction output value to be used as accurate parking lot berth prediction data, wherein the anti-normalization method comprises the following steps:
z*=z*(zmax-zmin)+zmin
wherein z is the predicted output value of the parking lot berth prediction model, zmaxIs the maximum value in the set of parking space history numbers, zminIs the minimum value in the set of parking space history numbers, z*And the target parking lot berth prediction data is obtained after inverse normalization processing.
And finally verifying the prediction accuracy of the parking lot berth prediction model by using the test set.
Example 2: a vehicle berthage prediction method based on reinforcement learning is carried out according to the following steps:
s1, obtaining the parking lot historical number of a target parking lot, crawling historical parking lot data of nine paths from 03 to 15 from 4 to 22 in 2019, 6 to 3 in 15 from 01 to 01 in Jiangxi Jingdezhen industry by a web crawler, taking 12583 time points with a second as a unit, taking the parking lot number of each time point as the parking lot historical data, taking 80% of data in a data set as a training set, and taking 20% of data in the data set as a testing set;
s2, carrying out normalization processing on the training set and the test set to form a data set, and compressing the data set into a real number between 0 and 1, wherein the formula of the normalization processing is as follows:
wherein d is the original data before normalization processing, d*To normalize the processed data, dmaxIs the maximum value of data in each field of the original data, dminThe minimum value of data in each field of the original data is obtained;
s3, further preprocessing the data set, the data set being represented as:
S={xi|i=1,2,...,n}
wherein x isiRepresenting the ith data in the data set, n representing the data length of the data set, and S being a parking space historical number set;
taking batch _ size as an integer as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, and circularly intercepting data in the data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.
Constructing a parking lot berth prediction model based on a reinforcement learning theory, wherein the parking lot berth prediction model adopts a PPO algorithm based on an Actor-Critic framework as shown in FIG. 2; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;
the calculation process of the criticic network is as follows:
hc,1=relu(xj*wc,1+bc,1)
hc,2=relu(hc,1*wc,2+bc,2)
hc,5=relu(hc,4*wc,5+bc,5)
L_criticj=relu(hc,5*wc,out+bc,out)
the calculation process of the Actor network is as follows:
ha,1=relu(xj*wa,1+ba,1)
ha,2=relu(ha,1*wa,2+ba,2)
ha,5=relu(ha,4*wa,5+ba,5)
L_actorj=relu(ha,5*wa,out+ba,out)
wherein x isjFor the jth data of the input sequence, wc,iAnd bc,i(i 1, 2.., 5) are the weight and bias, respectively, of the Critic network, wc,outAnd bc,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ CriticjAs a function of the value for deciding whether the Actor network exercises well or not, wa,iAnd ba,i(i 1, 2.., 5) are the weights and biases, respectively, of the Actor network, wa,outAnd ba,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;
select relu as the activation function, L _ actorjFor model prediction output, rt(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, the average value of epsilon is 0.2, LCLIP(theta) represents the probability ratio r by truncating the probability ratiot(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.
When the loss values of the loss functions of the Actor network and the Critic network are calculated, the loss values of the Actor network and the Critic network are optimized by using a Nadam random gradient descent algorithm;
the Nadam random gradient descent algorithm is an Adam optimization algorithm with a Nesterov momentum term, and the calculation process is as follows:
wherein, gtIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to gtThe amount of correction of (a) is,is mtIs a positive number close to 0 but not 0, eta is the learning rate of the Nadam algorithm, Delta thetatI.e. the updated gradient value, ui(i 1, 2.. t, t +1) is the momentum factor estimated for the moment of unity at time i.
Taking batch _ size as 5 as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, circularly intercepting data in a data set as an input sequence for predicting parking lot berth data at the next moment of the sequence, and inputting a training set into the parking lot berth prediction model for training; wherein the learning rates of the Actor and the Critic networks are respectively selected to be 0.0002 and 0.0001, and epsilon is 0.2;
s4, vehicle berth prediction of a target parking lot is carried out by using the trained parking lot berth prediction model, and anti-normalization processing is carried out through a prediction output value to be used as accurate parking lot berth prediction data, wherein the anti-normalization method comprises the following steps:
z*=z*(zmax-zmin)+zmin
wherein z is the predicted output value of the parking lot berth prediction model, zmaxIs the maximum value in the set of parking space history numbers, zminIs the minimum value in the set of parking space history numbers, z*And the target parking lot berth prediction data is obtained after inverse normalization processing.
And finally verifying the prediction accuracy of the parking lot berth prediction model by using the test set.
The applicant compares the predicted parking space values in the embodiments 1 and 2 with the real values in the test set, and the results are shown in fig. 3-4, fig. 3 is a graph showing the predicted parking space values in the embodiments 1 and the real values in the test set, and fig. 4 is a graph showing the predicted parking space values in the embodiments 2 and the real values in the test set. As seen from the figures 3 and 4, the prediction of the parking lot berth is matched with the real value, the prediction accuracy is very good, and the berth supply and demand condition of the target parking lot can be fully provided for the user.

Claims (6)

1. A vehicle berthage prediction method based on reinforcement learning is characterized in that: the method comprises the following steps:
a. acquiring the historical parking space number of a target parking lot, carrying out normalization processing on the historical parking space number to form a data set, taking 60-90% of data in the data set as a training set, and taking 10-40% of data in the data set as a test set;
b. building a parking lot berth prediction model based on a reinforcement learning theory, and inputting a training set into the parking lot berth prediction model for training;
c. and predicting the vehicle berth of the target parking lot by using the trained parking lot berth prediction model so as to verify the prediction accuracy of the parking lot berth prediction model by using a test set.
2. The reinforcement learning-based vehicle berth prediction method of claim 1, characterized in that: the parking space historical number in the step a is the parking space number of the target parking lot in a certain time period, normalization processing is carried out on the parking space historical number, the parking space historical number is compressed into a real number between 0 and 1, and the formula of the normalization processing is as follows:
wherein d is the original data before normalization processing, d*To normalize the processed data, dmaxIs the maximum value of data in each field of the original data, dminThe minimum value of data in each field of the original data.
3. The reinforcement learning-based vehicle berth prediction method according to claim 2, characterized in that: the parking lot berth prediction model adopts a PPO algorithm based on an Actor-Critic framework; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;
the calculation process of the criticic network is as follows:
hc,1=relu(xj*wc,1+bc,1)
hc,2=relu(hc,1*wc,2+bc,2)
hc,5=relu(hc,4*wc,5+bc,5)
L_criticj=relu(hc,5*wc,out+bc,out)
the calculation process of the Actor network is as follows:
ha,1=relu(xj*wa,1+ba,1)
ha,2=relu(ha,1*wa,2+ba,2)
ha,5=relu(ha,4*wa,5+ba,5)
L_actorj=relu(ha,5*wa,out+ba,out)
wherein x isjFor the jth data of the input sequence, wc,iAnd bc,iThe weights and offsets of the Critic network, i 1,2c,outAnd bc,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ CriticjAs a function of the value for deciding whether the Actor network exercises well or not, wa,iAnd ba,iAre the weights and offsets, i 1,2, 5, w, respectively, of the Actor networka,outAnd ba,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;
select relu as the activation function, L _ actorjFor model prediction output, rt(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, and the average value of epsilon is 0.2, LCLIP(theta) represents the probability ratio r by truncating the probability ratiot(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.
4. The reinforcement learning-based vehicle berthage prediction method according to claim 3, characterized in that when the loss values of the respective loss functions of the Actor network and the criticic network are calculated, the loss values of the Actor network and the criticic network are optimized by using a Nadam random gradient descent algorithm;
the Nadam random gradient descent algorithm is an Adam optimization algorithm with a Nesterov momentum term, and the calculation process is as follows:
wherein, gtIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to gtThe amount of correction of (a) is,is momentum term m at time ttIs a positive number close to 0 but not 0, eta is the learning rate of the Nadam algorithm, Delta thetatI.e. the updated gradient value, uiThe momentum factor estimated for the moment of the first order at time i, i 1, 2.
5. The reinforcement learning-based vehicle berth prediction method according to any one of claims 1 to 4, characterized in that: the data set is represented as:
S={xi|i=1,2,...,n}
wherein x isiRepresenting the ith data in the data set, n representing the data length of the data set, and S being a parking space historical number set;
taking batch _ size as an integer as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, and circularly intercepting data in the data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.
6. The reinforcement learning-based vehicle berth prediction method of claim 1, characterized in that: in the step c, after the vehicle berth prediction of the target parking lot is carried out by using the trained parking lot berth prediction model, the vehicle berth prediction of the target parking lot is carried out through inverse normalization processing according to the prediction output value and is used as accurate parking lot berth prediction data, and the inverse normalization method comprises the following steps:
z*=z*(zmax-zmin)+zmin
wherein z is the predicted output value of the parking lot berth prediction model, zmaxIs the maximum value in the set of parking space history numbers, zminIs the minimum value in the set of parking space history numbers, z*And the target parking lot berth prediction data is obtained after inverse normalization processing.
CN201910916466.9A 2019-09-26 2019-09-26 Vehicle berth prediction method based on reinforcement learning Pending CN110619442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910916466.9A CN110619442A (en) 2019-09-26 2019-09-26 Vehicle berth prediction method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910916466.9A CN110619442A (en) 2019-09-26 2019-09-26 Vehicle berth prediction method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN110619442A true CN110619442A (en) 2019-12-27

Family

ID=68924168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910916466.9A Pending CN110619442A (en) 2019-09-26 2019-09-26 Vehicle berth prediction method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110619442A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325382A (en) * 2020-01-23 2020-06-23 北京百度网讯科技有限公司 Method and device for predicting free parking space of parking lot, electronic equipment and storage medium
CN113392979A (en) * 2020-03-11 2021-09-14 宏达国际电子股份有限公司 Reinforced learning system and training method
CN114054736A (en) * 2021-10-12 2022-02-18 中国重型机械研究院股份公司 Buggy ladle parking system and method
CN114596726A (en) * 2021-10-27 2022-06-07 西安理工大学 Parking position prediction method based on interpretable space-time attention mechanism
CN114667852A (en) * 2022-03-14 2022-06-28 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198854A1 (en) * 2001-03-30 2002-12-26 Berenji Hamid R. Convergent actor critic-based fuzzy reinforcement learning apparatus and method
CN108961816A (en) * 2018-07-19 2018-12-07 泰华智慧产业集团股份有限公司 Road parking berth prediction technique based on optimization LSTM model
CN109741626A (en) * 2019-02-24 2019-05-10 苏州科技大学 Parking situation prediction technique, dispatching method and system
CN110059896A (en) * 2019-05-15 2019-07-26 浙江科技学院 A kind of Prediction of Stock Index method and system based on intensified learning
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198854A1 (en) * 2001-03-30 2002-12-26 Berenji Hamid R. Convergent actor critic-based fuzzy reinforcement learning apparatus and method
CN108961816A (en) * 2018-07-19 2018-12-07 泰华智慧产业集团股份有限公司 Road parking berth prediction technique based on optimization LSTM model
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN109741626A (en) * 2019-02-24 2019-05-10 苏州科技大学 Parking situation prediction technique, dispatching method and system
CN110059896A (en) * 2019-05-15 2019-07-26 浙江科技学院 A kind of Prediction of Stock Index method and system based on intensified learning

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325382A (en) * 2020-01-23 2020-06-23 北京百度网讯科技有限公司 Method and device for predicting free parking space of parking lot, electronic equipment and storage medium
CN111325382B (en) * 2020-01-23 2022-06-28 北京百度网讯科技有限公司 Method and device for predicting free parking space of parking lot, electronic equipment and storage medium
US11574259B2 (en) 2020-01-23 2023-02-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Parking lot free parking space predicting method, apparatus, electronic device and storage medium
CN113392979A (en) * 2020-03-11 2021-09-14 宏达国际电子股份有限公司 Reinforced learning system and training method
CN114054736A (en) * 2021-10-12 2022-02-18 中国重型机械研究院股份公司 Buggy ladle parking system and method
CN114054736B (en) * 2021-10-12 2022-10-18 中国重型机械研究院股份公司 Buggy ladle parking system and method
CN114596726A (en) * 2021-10-27 2022-06-07 西安理工大学 Parking position prediction method based on interpretable space-time attention mechanism
CN114596726B (en) * 2021-10-27 2024-01-19 西安华企众信科技发展有限公司 Parking berth prediction method based on interpretable space-time attention mechanism
CN114667852A (en) * 2022-03-14 2022-06-28 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
CN114667852B (en) * 2022-03-14 2023-04-14 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN110619442A (en) Vehicle berth prediction method based on reinforcement learning
CN112116080A (en) CNN-GRU water quality prediction method integrated with attention mechanism
CN110517482B (en) Short-term traffic flow prediction method based on 3D convolutional neural network
CN103839412B (en) A kind of crossing dynamic steering ratio combination method of estimation based on Bayes's weighting
CN109785618B (en) Short-term traffic flow prediction method based on combinational logic
CN114638440B (en) Charging load ultra-short-term prediction method based on charging pile utilization degree
CN108417032A (en) A kind of downtown area curb parking demand analysis prediction technique
Poonia et al. Short-term traffic flow prediction: using LSTM
CN112966871A (en) Traffic jam prediction method and system based on convolution long-short term memory neural network
CN114202316A (en) Urban rail transit train schedule optimization method based on deep reinforcement learning
Kofinas et al. Daily multivariate forecasting of water demand in a touristic island with the use of artificial neural network and adaptive neuro-fuzzy inference system
CN109583588A (en) A kind of short-term wind speed forecasting method and system
CN112116125A (en) Electric vehicle charging navigation method based on deep reinforcement learning
CN109558990B (en) Power distribution network disaster prevention backbone network frame planning method based on Steiner tree model
CN111311905A (en) Particle swarm optimization wavelet neural network-based expressway travel time prediction method
CN113935244A (en) Method and system for predicting short-term power load of urban rural distribution transformer
CN110110890A (en) Day wastewater quantity prediction method based on ELMAN neural network
CN115660293B (en) Comprehensive evaluation method for full life cycle of complex electromechanical product based on digital twin
CN116663419A (en) Sensorless equipment fault prediction method based on optimized Elman neural network
CN111724064A (en) Energy-storage-containing power distribution network planning method based on improved immune algorithm
CN106529713A (en) Grey GMDH network combination model-based wind speed prediction method and system
CN114777192B (en) Secondary network heat supply autonomous optimization regulation and control method based on data association and deep learning
CN109741597A (en) A kind of bus section runing time prediction technique based on improvement depth forest
Zhao et al. Wavelet embedded attentive Bi-LSTM for short-term passenger flow forecasting
CN111832873B (en) Pipe diameter determining method and system for water supply pipeline in old urban area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination