CN114550456B

CN114550456B - Urban traffic jam scheduling method based on reinforcement learning

Info

Publication number: CN114550456B
Application number: CN202210188427.3A
Authority: CN
Inventors: 肖友
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2023-07-04
Anticipated expiration: 2042-02-28
Also published as: CN114550456A

Abstract

The invention discloses an urban traffic jam scheduling method based on reinforcement learning, which comprises the steps of acquiring vehicle quantity information, vehicle queuing information and real-time data of traffic light states of an urban road intersection through an image sensor and an inductance sensor; then utilizing a machine learning algorithm, and forming intersection road condition state data as scheduling model training data according to real-time data of vehicle quantity information, vehicle queuing information and traffic light states by combining intersection priori knowledge of road section limit and lane information obtained from image information and storage structural data; calculating a reward signal according to the passing effect and the reward function of each lane of the intersection fed back by the environment by the scheduling model, so as to train the scheduling model; training a scheduling model based on intersection road condition state data and intersection traffic safety criteria by using a reinforcement learning algorithm; and taking intersection road condition state data as input, and outputting a traffic light state instruction and corresponding traffic light control signals through the trained scheduling model.

Description

Urban traffic jam scheduling method based on reinforcement learning

Technical Field

The invention relates to the field of intelligent traffic, in particular to an urban traffic jam scheduling method based on reinforcement learning.

Background

With the continuous improvement of the economic level of people and the promotion of the urban process, the problem of urban traffic jam is more serious when automobiles serve as the most important transportation means and enter thousands of households. Traffic jams, on the one hand, reduce social productivity, cause substantial economic losses, consume fuel resources, and cause serious carbon dioxide emissions problems. Therefore, the urban traffic efficiency is improved, and the optimized traffic scheduling method occupies an important position in the modern traffic field, wherein the traffic light intersection traffic is the most common traffic efficiency bottleneck of urban road sections.

The existing traffic light control method is mainly divided into two major categories, one category is a traditional signal lamp control algorithm based on rules, such as algorithms of fixed duration, traffic flow, lane occupation ratio and the like, the cognition of the method on the scene is one-sided, the traffic flow scheduling is difficult to deal with under the complex scene, and the vehicle passing efficiency is low. The other type is an adaptive control algorithm based on machine learning, such as a traffic light scheduling algorithm based on reinforcement learning, the reinforcement learning has achieved good performance in the fields of game games, optimized scheduling and the like, and attention is paid to the traffic light control field in recent years due to the characteristics of self-learning and decision capability improvement of reinforcement learning.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to solve the technical problems that: how to provide a reinforced learning-based urban traffic jam scheduling method for improving urban vehicle traffic efficiency and relieving traffic jam.

In order to solve the technical problems, the invention adopts the following technical scheme:

a city traffic jam scheduling method based on reinforcement learning comprises the following steps:

(1) Acquiring real-time data of vehicle quantity information, vehicle queuing information and traffic light states of an urban road intersection through an image sensor and an inductance sensor;

(2) Combining the road section limit acquired from the image information and the reserve structural data with the intersection priori knowledge of the lane information to form intersection road condition state data as scheduling model training data by utilizing a machine learning algorithm according to the real-time data of the number information of vehicles, the queuing information of the vehicles and the traffic light state;

(3) Adopting a reinforcement learning algorithm, selecting a traffic light state switching action in an action space for traffic light state switching by a scheduling model according to intersection road condition state data and intersection traffic safety criteria at a given moment, calculating a reward signal according to traffic effects and reward functions of all lanes of an intersection fed back by environment, and maximizing the action selected by the model after multiple iterations so as to train the scheduling model;

(4) And taking intersection road condition state data as input, and outputting a traffic light state instruction and corresponding traffic light control signals through the trained scheduling model.

As an optimization, in the step (1), the running speed of the vehicle approaching the intersection is also acquired through a laser radar, and the environmental state information of the intersection is also acquired through a temperature sensor and a humidity sensor.

In the step (2), data cleaning and feature construction data preprocessing work is performed on real-time data of vehicle quantity information, vehicle queuing information and traffic light states, and then any one of CNN, MLP, GBDT, SVM machine learning algorithms is utilized to extract structural real-time road condition features input as a scheduling model.

As an optimization, in step (2), the intersection a priori knowledge includes road segment speed limit, steering limit, number of lanes, lane category, and traffic light switching duration.

As optimization, in the step (3), the reinforcement learning algorithm comprises a Q-learning or time difference algorithm, the input characteristics of the reinforcement learning algorithm and the variables of the reward function are obtained from the road condition state data of the intersection in the step (2), the input characteristics of the reinforcement learning algorithm comprise the average speed of vehicles of each lane of the intersection, the number of vehicles, the positions of the vehicles, the number of lanes, the type of lanes, the weather state, the accident state and the traffic efficiency, wherein the traffic efficiency is calculated by the formula (1), and the variables of the reward function comprise the traffic number, the waiting time of the vehicles, the average speed difference of the vehicles before and after the traffic and whether traffic lights are switched;

wherein efficiency is the overall passing efficiency of the vehicle, v _{car_avg} For average speed of vehicles at intersection v _{lane_speed_limit} Is the upper limit speed of the crossing.

As an optimization, in the step (3), the intersection traffic safety criterion is a basic constraint on the safe traffic of the intersection so as to ensure that traffic of each lane cannot collide.

In the step (4), the road condition data of the intersection and the priori knowledge of the intersection are input into a scheduling model to obtain the target state of the traffic light, if the current traffic light state is consistent with the target state, the traffic light switching action is not performed, and otherwise, the traffic light is switched to the target state.

In summary, the beneficial effects of the invention are as follows: the invention solves the problems of incomplete strategy input and inflexible control strategy of the traditional scheduling algorithm by combining the current intersection road condition information with the reinforcement learning algorithm, provides a solution for the urban complex traffic network scheduling, effectively relieves the traffic jam condition and improves the urban vehicle passing efficiency.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of the overall control of the intersection vehicle dispatch in the present invention;

FIG. 2 is a flow chart of reinforcement learning model information in the present invention;

fig. 3 is a diagram of an active state space of a traffic lamp according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1 and 2, the urban traffic jam scheduling method based on reinforcement learning in this embodiment includes the following steps:

In this embodiment, in step (1), the traveling speed of the vehicle approaching the intersection is also obtained by the laser radar, and the environmental state information of the intersection is also obtained by the temperature sensor and the humidity sensor.

In the specific embodiment, in the step (2), data preprocessing work of data cleaning and feature construction is performed on real-time data of vehicle number information, vehicle queuing information and traffic light states, and then any one machine learning algorithm in CNN, MLP, GBDT, SVM is utilized to extract structured real-time road condition features input as a scheduling model.

In this embodiment, in step (2), the intersection priori knowledge includes a road segment speed limit, a steering limit, a number of lanes, a lane category, and a traffic light switching duration.

In this specific embodiment, in step (3), the reinforcement learning algorithm includes a Q-learning or time difference algorithm, the input features of the reinforcement learning algorithm and variables of the reward function are obtained from the road condition status data of the intersection in step (2), the input features of the reinforcement learning algorithm include the average speed of the vehicles, the number of vehicles, the positions of the vehicles, the number of lanes, the type of lanes, the weather status, the accident status and the traffic efficiency of each lane of the intersection, where the traffic efficiency is calculated by the formula (1), and the variables of the reward function include the number of traffic, the waiting time of the vehicles, the average speed difference of the vehicles before and after the traffic, and whether the traffic lights are switched;

In this embodiment, in step (3), the intersection traffic safety criterion is a basic constraint on the safe traffic of the intersection, so as to ensure that traffic in each lane does not collide. In the dispatch model, the safety criterion can be combined with the traffic light state space, for example, for a standard intersection, the effective state space of the traffic light can be considered to have 8 states, as shown in fig. 3, so that the state with the largest rewards among the 8 states can be selected as the traffic light target state according to the model according to the road condition state input

In the specific embodiment, in step (4), the road condition status data of the intersection and the priori knowledge of the intersection are input into the scheduling model to obtain the target state of the traffic light, if the current traffic light state is consistent with the target state, the traffic light switching action is not performed, otherwise, the traffic light is switched to the target state.

Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A city traffic jam scheduling method based on reinforcement learning is characterized in that: the method comprises the following steps:

(3) Adopting a reinforcement learning algorithm, wherein the input characteristics of the reinforcement learning algorithm comprise the average speed, the number of vehicles, the positions of the vehicles, the number of lanes, the type of lanes, the weather state, the accident state and the traffic efficiency of each lane of an intersection, selecting a traffic light state switching action in an action space for switching traffic light states according to intersection road condition state data and intersection traffic safety criteria by a scheduling model at a given moment, calculating a reward signal according to the traffic effect and the reward function of each lane of the intersection fed back by the environment, and maximizing the action selected by the model after multiple iterations so as to train the scheduling model;

2. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in the step (1), the running speed of the vehicle approaching the intersection is also obtained through a laser radar, and the environmental state information of the intersection is also obtained through a temperature sensor and a humidity sensor.

3. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in the step (2), data preprocessing work of data cleaning and feature construction is performed on real-time data of vehicle quantity information, vehicle queuing information and traffic light states, and then any one machine learning algorithm in CNN, MLP, GBDT, SVM is utilized to extract structural real-time road condition features input as a scheduling model.

4. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in step (2), the intersection prior knowledge includes road segment speed limit, steering limit, number of lanes, lane category, and traffic light switching duration.

5. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in the step (3), the reinforcement learning algorithm comprises a Q-learning or time difference algorithm, the input characteristics of the reinforcement learning algorithm and the variables of the reward function are obtained by the road condition state data of the intersection in the step (2), the passing efficiency in the input characteristics of the reinforcement learning algorithm is calculated by a formula (1), and the variables of the reward function comprise the passing number, the waiting time of the vehicle, the average speed difference of the vehicles before and after passing and whether traffic lights are switched;

6. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in step (3), the intersection traffic safety criterion is a basic constraint on the safe traffic of the intersection, so as to ensure that traffic of each lane does not collide.

7. The reinforcement learning-based urban traffic congestion scheduling method according to claim 1, wherein: in the step (4), the road condition data of the intersection and the priori knowledge of the intersection are input into a scheduling model to obtain the target state of the traffic light, if the current traffic light state is consistent with the target state, the traffic light switching action is not performed, and otherwise, the traffic light is switched to the target state.