CN110619442A

CN110619442A - Vehicle berth prediction method based on reinforcement learning

Info

Publication number: CN110619442A
Application number: CN201910916466.9A
Authority: CN
Inventors: 岑跃峰; 张晨光; 岑岗; 马伟锋; 程志刚; 徐昶; 周闻; 王佳晨; 蔡永平; 张宇来
Original assignee: Zhejiang University of Science and Technology ZUST
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd; Zhejiang University of Science and Technology ZUST
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2019-12-27

Abstract

The invention discloses a vehicle berthage prediction method based on reinforcement learning, which comprises the following steps: a. acquiring the historical parking space number of a target parking lot, carrying out normalization processing on the historical parking space number to form a data set, taking 60-90% of data in the data set as a training set, and taking 10-40% of data in the data set as a test set; b. building a parking lot berth prediction model based on a reinforcement learning theory, and inputting a training set into the parking lot berth prediction model for training; c. and predicting the vehicle berth of the target parking lot by using the trained parking lot berth prediction model so as to verify the prediction accuracy of the parking lot berth prediction model by using a test set. The invention can achieve the purpose of predicting the parking position condition of the parking lot, and the prediction accuracy of the parking position condition is high.

Description

Vehicle berth prediction method based on reinforcement learning

Technical Field

The invention relates to the field of neural networks and reinforcement learning, in particular to a vehicle berthage prediction method based on reinforcement learning.

Background

With the general improvement of the consumption level of people in China, the number of motor vehicles kept by urban and rural residents in China is remarkably increased, the parking problem is gradually highlighted in the daily life and work of people, and is particularly obvious in the central area of a city. In order to relieve the contradiction between supply and demand of motor vehicles at berth, a plurality of motor vehicle berth prediction methods applying artificial intelligence analyze the problem, and further judge the supply and demand conditions of the berth near the motor vehicles by means of accurate berth prediction, thereby providing reliable berth demand for drivers and relieving the problem of parking in cities and countryside.

The traditional parking space prediction method is mainly a single network prediction technology or a BP neural network prediction method, can effectively predict parking spaces in a short time by processing parking space data of a parking lot, but has insufficient data volatility processing capacity, and the obtained prediction data has poor stability. For example, the invention with the publication number of CN108648449A discloses a parking space prediction method based on a combination of kalman filtering and NAR neural network, according to the predicted data, the accuracy of two prediction models of kalman filtering and NAR neural network is compared, and the times with high accuracy are used as the weight for performing combined prediction of the two models, so as to obtain the predicted value of the combined model. However, in many parking lots, it is difficult to provide a large amount of parking position data for the combined model to perform weight selection for model fusion, so that the practical value of the model is greatly reduced, and the method is not suitable for the early stage of parking position prediction.

Disclosure of Invention

The invention aims to provide a vehicle berthage prediction method based on reinforcement learning. The invention can achieve the purpose of predicting the parking position condition of the parking lot, and the prediction accuracy of the parking position condition is high.

The technical scheme of the invention is as follows: a vehicle berthage prediction method based on reinforcement learning is carried out according to the following steps:

a. acquiring the historical parking space number of a target parking lot, carrying out normalization processing on the historical parking space number to form a data set, taking 60-90% of data in the data set as a training set, and taking 10-40% of data in the data set as a test set;

b. building a parking lot berth prediction model based on a reinforcement learning theory, and inputting a training set into the parking lot berth prediction model for training;

c. and predicting the vehicle berth of the target parking lot by using the trained parking lot berth prediction model so as to verify the prediction accuracy of the parking lot berth prediction model by using a test set.

In the vehicle parking space prediction method based on reinforcement learning, the historical parking space number in step a is the parking space number of the target parking space in a certain time period, the historical parking space number is normalized and compressed into a real number between 0 and 1, and the formula of the normalization process is as follows:

wherein d is the original data before normalization processing, d^*To normalize the processed data, d_maxIs the maximum value of data in each field of the original data, d_minThe minimum value of data in each field of the original data.

In the vehicle berthage prediction method based on reinforcement learning, the parking lot berthage prediction model adopts a PPO algorithm based on an Actor-Critic framework; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;

the calculation process of the criticic network is as follows:

h_c,1＝relu(x_j*w_c,1+b_c,1)

h_c,2＝relu(h_c,1*w_c,2+b_c,2)

h_c,5＝relu(h_c,4*w_c,5+b_c,5)

L_critic_j＝relu(h_c,5*w_c,out+b_c,out)

the calculation process of the Actor network is as follows:

h_a,1＝relu(x_j*w_a,1+b_a,1)

h_a,2＝relu(h_a,1*w_a,2+b_a,2)

h_a,5＝relu(h_a,4*w_a,5+b_a,5)

L_actor_j＝relu(h_a,5*w_a,out+b_a,out)

wherein x is_jFor the jth data of the input sequence, w_c,iAnd b_c,iRespectively Critic networkWeights and offsets of the complex, i ═ 1,2_c,outAnd b_c,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ Critic_jAs a function of the value for deciding whether the Actor network exercises well or not, w_a,iAnd b_a,iAre the weights and offsets, i 1,2, 5, w, respectively, of the Actor network_a,outAnd b_a,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;

select relu as the activation function, L _ actor_jFor model prediction output, r_t(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, and the average value of epsilon is 0.2, L^CLIP(theta) represents the probability ratio r by truncating the probability ratio_t(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.

In the vehicle berthage prediction method based on reinforcement learning, when loss values of respective loss functions are calculated by the Actor network and the Critic network, the loss values of the Actor network and the Critic network are optimized by using a Nadam random gradient descent algorithm;

the Nadam random gradient descent algorithm is an Adam optimization algorithm with a Nesterov momentum term, and the calculation process is as follows:

wherein, g_tIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to g_tThe amount of correction of (a) is,is momentum term m at time t_tAverage value of (d); ξ is a positive number close to 0 but not 0, η is the Nadam algorithm learning rate, Δ θ_tI.e. the updated gradient value, u_iThe momentum factor estimated for the moment of the first order at time i, i 1, 2.

In the vehicle berth prediction method based on reinforcement learning, the data set is represented as:

S＝{x_i|i＝1,2,...,n}

wherein x is_iRepresenting the ith data in the data set, n representing the data length of the data set, and S being a parking space historical number set;

performing data preprocessing on a parking lot historical number set S of a target parking lot, dividing the parking lot historical number set S into a training set and a testing set, taking batch _ size as an integer as an input batch size of a parking lot berth prediction model, taking batch _ size as a fixed value of the parking lot berth prediction model, and circularly intercepting data in a data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.

In the step c, after the vehicle parking position of the target parking lot is predicted by using the trained parking lot parking position prediction model, the vehicle parking position prediction method performs inverse normalization processing on the prediction output value to be used as accurate parking lot parking position prediction data, and the inverse normalization method is as follows:

z^*＝z*(z_max-z_min)+z_min

wherein z is the predicted output value of the parking lot berth prediction model, z_maxIs the maximum value in the set of parking space history numbers, z_minIs the minimum value in the set of parking space history numbers, z^*And the target parking lot berth prediction data is obtained after inverse normalization processing.

Compared with the prior art, the invention provides a vehicle parking position prediction method based on reinforcement learning, which is characterized in that a corresponding parking position prediction model is constructed, and the acquired historical number of parking positions of a target parking lot is subjected to normalization processing to form a data set so as to reduce noise influence in the subsequent training process; and performing autonomous training and learning on the parking lot berth prediction model by using a training set divided by the data set, predicting the berthing condition of a target parking lot by using the trained parking lot berth prediction model, and finally verifying the prediction accuracy of the parking lot berth prediction model by using the test set. Therefore, the prediction accuracy and robustness of the parking lot berth prediction model can be greatly improved; the invention has strong practicability and stable and accurate prediction data, and can fully provide the berth supply and demand condition of the target parking lot for users. In addition, the Actor-Critic reinforcement learning framework is used for vehicle berthage prediction, wherein the Actor network is used for model training set prediction, the Critic network evaluates the training target of the Actor network in the model training process and gives the advantage of the Actor network after strategy updating, and therefore the effectiveness and the accuracy of parking lot berthage prediction model training are effectively improved. The invention also introduces the target cutting idea into the model training process, limits the large-scale updating of the strategy target, reduces the weight selection of the model, simplifies the integral structure of the model and has better convenience. Furthermore, the method adopts the Nadam algorithm to carry out gradient optimization, can have stronger constraint on the learning rate in the training process, has more direct influence on the updating of the gradient, and ensures the stable updating process of the gradient.

Drawings

FIG. 1 is a flow chart of a prediction method of the present invention;

FIG. 2 is a block diagram of the system of the present invention;

FIG. 3 is a prediction graph of example 1 of the present invention, with a sequence length of 5 days;

FIG. 4 is a prediction graph of example 2 of the present invention, with a sequence length of 5 days.

Detailed Description

The invention is further illustrated by the following figures and examples, which are not to be construed as limiting the invention.

Example 1: a vehicle berth prediction method based on reinforcement learning is shown in figure 1 and is carried out according to the following steps:

s1, obtaining the historical parking space number of the target parking lot, crawling the historical parking space data of the park from 03 at 11 th at 4 and 22 th at 2019 to 15 at 6 and 3 th at 2019 and 15 at 01 th at the mountain creating road in the Hangzhou city in Zhejiang by a web crawler, wherein 12583 time points are included by taking seconds as a unit, the parking space number of each time point is used as the historical parking space data, 80% of the historical data is used as a training set, and 20% of the historical data is used as a testing set;

s2, carrying out normalization processing on the training set and the test set to form a data set, and compressing the data set into a real number between 0 and 1, wherein the formula of the normalization processing is as follows:

wherein d is the original data before normalization processing, d^*To normalize the processed data, d_maxIs the maximum value of data in each field of the original data, d_minThe minimum value of data in each field of the original data is obtained;

s3, further preprocessing the data set, the data set being represented as:

S＝{x_i|i＝1,2,...,n}

wherein x is_iRepresenting the ith data in the data set, n representing the data length of the data set, S being the parking lotA set of bit history numbers;

taking batch _ size as an integer as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, and circularly intercepting data in the data set as an input sequence for predicting parking lot berth data at the next moment of the sequence; where batch _ size is expressed as batch size.

Constructing a parking lot berth prediction model based on a reinforcement learning theory, wherein the parking lot berth prediction model adopts a PPO algorithm based on an Actor-Critic framework as shown in FIG. 2; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;

the calculation process of the criticic network is as follows:

h_c,1＝relu(x_j*w_c,1+b_c,1)

h_c,2＝relu(h_c,1*w_c,2+b_c,2)

h_c,5＝relu(h_c,4*w_c,5+b_c,5)

L_critic_j＝relu(h_c,5*w_c,out+b_c,out)

the calculation process of the Actor network is as follows:

h_a,1＝relu(x_j*w_a,1+b_a,1)

h_a,2＝relu(h_a,1*w_a,2+b_a,2)

h_a,5＝relu(h_a,4*w_a,5+b_a,5)

L_actor_j＝relu(h_a,5*w_a,out+b_a,out)

wherein x is_jFor the jth data of the input sequence, w_c,iAnd b_c,i(i 1, 2.., 5) are the weight and bias, respectively, of the Critic network, w_c,outAnd b_c,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ Critic_jAs a function of the value for deciding whether the Actor network exercises well or not, w_a,iAnd b_a,i(i 1, 2.., 5) are the weights and biases, respectively, of the Actor network, w_a,outAnd b_a,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;

select relu as the activation function, L _ actor_jFor model prediction output, r_t(theta) is the ratio of the new strategy to the old strategy of the model at the current time t, wherein the new strategy and the old strategy are state values before and after the model training, theta represents a vector for updating the strategy parameters and represents a mapping relation,for the advantage of strategy updating, epsilon is a hyper-parameter of PPO algorithm, the average value of epsilon is 0.2, L^CLIP(theta) represents the probability ratio r by truncating the probability ratio_t(theta) in the interval [ 1-epsilon, 1+ epsilon]And eliminating the outside excitation, and finally taking the average value of the minimum values in the truncated target and the non-truncated target, and taking the minimum value of the average value as a model loss function loss.

When the loss values of the loss functions of the Actor network and the Critic network are calculated, the loss values of the Actor network and the Critic network are optimized by using a Nadam random gradient descent algorithm;

wherein, g_tIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to g_tThe amount of correction of (a) is,is m_tIs a positive number close to 0 but not 0, eta is the learning rate of the Nadam algorithm, Delta theta_tI.e. the updated gradient value, u_i(i 1, 2.. t, t +1) is the momentum factor estimated for the moment of unity at time i.

Taking batch _ size as 5 as the input batch size of the parking lot berth prediction model, taking batch _ size as a fixed value, circularly intercepting data in a data set as an input sequence for predicting parking lot berth data at the next moment of the sequence, and inputting a training set into the parking lot berth prediction model for training; wherein the learning rates of the Actor and the Critic networks are respectively selected to be 0.0002 and 0.0001, and epsilon is 0.2;

s4, vehicle berth prediction of a target parking lot is carried out by using the trained parking lot berth prediction model, and anti-normalization processing is carried out through a prediction output value to be used as accurate parking lot berth prediction data, wherein the anti-normalization method comprises the following steps:

z^*＝z*(z_max-z_min)+z_min

And finally verifying the prediction accuracy of the parking lot berth prediction model by using the test set.

Example 2: a vehicle berthage prediction method based on reinforcement learning is carried out according to the following steps:

s1, obtaining the parking lot historical number of a target parking lot, crawling historical parking lot data of nine paths from 03 to 15 from 4 to 22 in 2019, 6 to 3 in 15 from 01 to 01 in Jiangxi Jingdezhen industry by a web crawler, taking 12583 time points with a second as a unit, taking the parking lot number of each time point as the parking lot historical data, taking 80% of data in a data set as a training set, and taking 20% of data in the data set as a testing set;

s3, further preprocessing the data set, the data set being represented as:

S＝{x_i|i＝1,2,...,n}

the calculation process of the criticic network is as follows:

h_c,1＝relu(x_j*w_c,1+b_c,1)

h_c,2＝relu(h_c,1*w_c,2+b_c,2)

h_c,5＝relu(h_c,4*w_c,5+b_c,5)

L_critic_j＝relu(h_c,5*w_c,out+b_c,out)

the calculation process of the Actor network is as follows:

h_a,1＝relu(x_j*w_a,1+b_a,1)

h_a,2＝relu(h_a,1*w_a,2+b_a,2)

h_a,5＝relu(h_a,4*w_a,5+b_a,5)

L_actor_j＝relu(h_a,5*w_a,out+b_a,out)

z^*＝z*(z_max-z_min)+z_min

The applicant compares the predicted parking space values in the embodiments 1 and 2 with the real values in the test set, and the results are shown in fig. 3-4, fig. 3 is a graph showing the predicted parking space values in the embodiments 1 and the real values in the test set, and fig. 4 is a graph showing the predicted parking space values in the embodiments 2 and the real values in the test set. As seen from the figures 3 and 4, the prediction of the parking lot berth is matched with the real value, the prediction accuracy is very good, and the berth supply and demand condition of the target parking lot can be fully provided for the user.

Claims

1. A vehicle berthage prediction method based on reinforcement learning is characterized in that: the method comprises the following steps:

2. The reinforcement learning-based vehicle berth prediction method of claim 1, characterized in that: the parking space historical number in the step a is the parking space number of the target parking lot in a certain time period, normalization processing is carried out on the parking space historical number, the parking space historical number is compressed into a real number between 0 and 1, and the formula of the normalization processing is as follows:

3. The reinforcement learning-based vehicle berth prediction method according to claim 2, characterized in that: the parking lot berth prediction model adopts a PPO algorithm based on an Actor-Critic framework; the Actor-criticic architecture comprises an Actor network and a criticic network; the criticic network feeds back the quality of training used for feedback to the Actor network through a value function;

the calculation process of the criticic network is as follows:

h_c,1＝relu(x_j*w_c,1+b_c,1)

h_c,2＝relu(h_c,1*w_c,2+b_c,2)

h_c,5＝relu(h_c,4*w_c,5+b_c,5)

L_critic_j＝relu(h_c,5*w_c,out+b_c,out)

the calculation process of the Actor network is as follows:

h_a,1＝relu(x_j*w_a,1+b_a,1)

h_a,2＝relu(h_a,1*w_a,2+b_a,2)

h_a,5＝relu(h_a,4*w_a,5+b_a,5)

L_actor_j＝relu(h_a,5*w_a,out+b_a,out)

wherein x is_jFor the jth data of the input sequence, w_c,iAnd b_c,iThe weights and offsets of the Critic network, i 1,2_c,outAnd b_c,outCorresponding to the weight and bias value of the output layer of the Critic network, L _ Critic_jAs a function of the value for deciding whether the Actor network exercises well or not, w_a,iAnd b_a,iAre the weights and offsets, i 1,2, 5, w, respectively, of the Actor network_a,outAnd b_a,outCorrespondingly, the weight and the bias value of the output layer of the Critic network are obtained;

4. The reinforcement learning-based vehicle berthage prediction method according to claim 3, characterized in that when the loss values of the respective loss functions of the Actor network and the criticic network are calculated, the loss values of the Actor network and the criticic network are optimized by using a Nadam random gradient descent algorithm;

wherein, g_tIs the gradient at the present time t,the correction amounts for the first moment estimate and the second moment estimate of the gradient at time t respectively,is to g_tThe amount of correction of (a) is,is momentum term m at time t_tIs a positive number close to 0 but not 0, eta is the learning rate of the Nadam algorithm, Delta theta_tI.e. the updated gradient value, u_iThe momentum factor estimated for the moment of the first order at time i, i 1, 2.

5. The reinforcement learning-based vehicle berth prediction method according to any one of claims 1 to 4, characterized in that: the data set is represented as:

S＝{x_i|i＝1,2,...,n}

6. The reinforcement learning-based vehicle berth prediction method of claim 1, characterized in that: in the step c, after the vehicle berth prediction of the target parking lot is carried out by using the trained parking lot berth prediction model, the vehicle berth prediction of the target parking lot is carried out through inverse normalization processing according to the prediction output value and is used as accurate parking lot berth prediction data, and the inverse normalization method comprises the following steps:

z^*＝z*(z_max-z_min)+z_min