CN117275240B - Traffic signal reinforcement learning control method and device considering multiple types of driving styles - Google Patents

Traffic signal reinforcement learning control method and device considering multiple types of driving styles Download PDF

Info

Publication number
CN117275240B
CN117275240B CN202311554142.8A CN202311554142A CN117275240B CN 117275240 B CN117275240 B CN 117275240B CN 202311554142 A CN202311554142 A CN 202311554142A CN 117275240 B CN117275240 B CN 117275240B
Authority
CN
China
Prior art keywords
vehicle
reinforcement learning
driving style
traffic
driving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311554142.8A
Other languages
Chinese (zh)
Other versions
CN117275240A (en
Inventor
徐图
庞钰琪
李碧清
曲鑫
朱永东
华炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311554142.8A priority Critical patent/CN117275240B/en
Publication of CN117275240A publication Critical patent/CN117275240A/en
Application granted granted Critical
Publication of CN117275240B publication Critical patent/CN117275240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a traffic signal reinforcement learning control method and device considering multiple types of driving styles. Comprising the following steps: determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection; acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle; setting a reinforcement learning environment comprising a state space, an action space and a reward function; training the reinforcement learning agent; and deploying the trained agent at the intersection to realize the reinforcement learning control of the traffic signal. Compared with the traditional traffic signal control method, the method considers the real-time traffic flow and is more intelligent; compared with other reinforcement learning traffic control methods, the method considers multiple types of driving styles and is beneficial to further improving traffic efficiency.

Description

Traffic signal reinforcement learning control method and device considering multiple types of driving styles
Technical Field
The invention relates to the technical field of intelligent traffic, in particular to a traffic signal reinforcement learning control method and device considering multiple types of driving styles.
Background
A large number of researches show that under the environment of complete network automatic driving, the existence of the automatic driving vehicle can improve the traffic flow efficiency and the stability of traffic flow. However, these studies have made ideal assumptions about the maturity of the networking technology, the controllability of the autopilot, and the permeability. Under the condition that the automatic driving vehicle is uncontrollable, a more feasible method is still to optimize a traffic control system (optimizing signals and the like and variable speed limit) at the road side, so that traffic delay is reduced.
In recent years, with the deep integration of new technologies such as big data, internet of vehicles, artificial intelligence, etc. with the traffic industry, traffic control research has presented a trend of transition from traditional traffic engineering methods to artificial intelligence methods represented by reinforcement learning. The prior scholars prove that the traffic efficiency can be improved by using the traffic signal control method based on reinforcement learning at the intersection, however, in the traffic flow of the hybrid driving of the automatic/manual driving vehicle in the future, a large number of different driving styles exist, the driving styles of people are various and have strong randomness, the automatic driving vehicle adopts a deterministic algorithm, but different vehicle brands exist in the traffic flow, and the automatic driving vehicle of the same brand also has different driving modes. Multiple types of driving styles result in uncontrollable optimization algorithm effects. In view of the above difficulties, there is a need to devise a traffic control optimization method that considers multiple types of driving styles. Thereby promoting the industrialization application of autopilot, and promoting the development of emerging industries such as intelligent transportation, the internet of things and the like.
Therefore, the method considers the track information of all vehicles around the intersection under the network automatic driving environment, realizes the rapid classification of the driving style, takes the driving style of the vehicle, the information such as the road junction topology, the vehicle position and the like as the state variable in the reinforcement learning environment, and further realizes the reinforcement learning-based traffic signal control method considering multiple driving styles.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a traffic signal reinforcement learning control method and device considering multiple types of driving styles.
The aim of the invention is realized by the following technical scheme: a traffic signal reinforcement learning control method considering a plurality of types of driving styles, comprising:
determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection;
acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle;
setting a reinforcement learning environment comprising a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;
training the reinforcement learning agent;
and (5) using the trained reinforcement learning intelligent agent to control traffic.
Further, the determining the category of the driving style of the vehicle based on the historical track data of the vehicles around the intersection comprises:
acquiring space-time track data of a plurality of vehicles within a period of time, extracting characterization indexes, and performing dimension reduction by using a principal component analysis method to obtain a plurality of principal component elements; classifying the vehicles by adopting a K-means cluster analysis method, and determining a K value so as to divide the driving style of the vehicles into K classes; aiming at each type of driving style of vehicle, obtaining K groups of different model parameters by depending on an IDM vehicle following model; and determining the category of the driving style of the vehicle by depending on the numerical value of each group of model parameters;
the characterization indexes comprise maximum speed, average speed, standard deviation of speed, maximum acceleration, maximum deceleration, average acceleration, standard deviation of acceleration, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference; the model parameters include target speed, safe headway, minimum safe headway, maximum acceleration of the own vehicle and comfortable deceleration.
Further, the acquiring the real-time track data of the vehicles around the intersection, and the acquiring the vehicle driving style in real time in combination with the determined vehicle driving style category includes:
for each vehicle in the intersection environment, acquiring actual acceleration according to the historical track x of the vehicleAnd calculate the likelihood function value +.>Find to make->A set of model parameters with the maximum value +.>As an IDM vehicle following model parameter of the vehicle, determining a driving style of the vehicle;
wherein,for target speed, T is the safe headway, < ->Is the minimum safe distance between vehicles>For maximum acceleration of the vehicle +.>Is a comfortable deceleration.
Further, the training of the reinforcement learning agent includes:
the vehicle simulation tool is adopted to simulate the reinforcement learning environment, and state space, action space and rewarding function in the simulation environment are transmitted to the reinforcement learning intelligent agent, so that the reinforcement learning intelligent agent is trained.
Further, the vehicle simulation tool includes a SUMO simulation tool.
Further, the reinforcement learning agent adopts a deep Q learning method to select the optimal action; the method comprises the following steps: using deep neural networks:/>To estimate the action-cost function, the reinforcement learning environment at time t is->Input neural network->Output every action in action space A>The score of (2) is the highest, namely the optimal action;
in the training process, the motion cost function is updated by adopting a Belman equation:
wherein,for learning rate->For the discount rate of return, ++>Is->Rewards observed by the time of day environment, +.>Is a neural network parameter, and S is a state space.
Further, the performing traffic control using the trained reinforcement learning agent includes:
the method comprises the steps of obtaining position, speed and track information of all vehicles driving to an intersection through vehicle networking, judging driving style of each vehicle, and obtaining traffic state in real time; and transmitting the traffic state to the reinforcement learning intelligent agent, and selecting and executing the optimal action by the reinforcement learning intelligent agent according to the trained action-cost function at each moment.
A traffic signal reinforcement learning control device considering multiple types of driving styles comprises
And an offline clustering module: the method comprises the steps of determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection;
the on-line identification module is used for acquiring real-time track data of vehicles around the intersection and acquiring the driving style of the vehicle in real time by combining the determined driving style category of the vehicle;
the intelligent training module is used for setting a reinforcement learning environment and training the reinforcement learning intelligent; the reinforcement learning environment comprises a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;
and the traffic control module is used for carrying out traffic control by using the trained reinforcement learning intelligent agent.
A traffic signal reinforcement learning control device considering multiple types of driving styles comprises one or more processors, and is used for realizing the traffic signal reinforcement learning control method considering multiple types of driving styles.
A computer-readable storage medium having stored thereon a program which, when executed by a processor, is adapted to implement a traffic signal reinforcement learning control method that takes into account multiple types of driving styles as described above.
The beneficial effects of the invention are as follows: compared with the traditional traffic signal control method, the method considers the real-time traffic flow and is more intelligent; compared with the traditional reinforcement learning algorithm, the method adds the driving style of the vehicle into the state variable of the reinforcement learning environment, so that the state information of the reinforcement learning algorithm is more abundant, the training effect of the algorithm is improved, and the traffic flow efficiency of the intersection is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic flow chart of a traffic signal reinforcement learning control method considering multiple driving styles according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of an offline clustering module and an online recognition module;
FIG. 3 is a schematic diagram of reinforcement learning environment state variable settings;
FIG. 4 is a schematic diagram of a reinforcement learning agent training and SUMO simulation interaction method;
fig. 5 is a hardware configuration diagram provided in an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.
According to the method, under the network-connected automatic driving environment, the rapid classification of driving styles is realized by depending on the track information of all vehicles around the intersection, and the driving styles of the vehicles, the information such as the road junction topology, the vehicle positions and the like are used as state variables in the reinforcement learning environment, so that the reinforcement learning-based traffic signal control method considering multiple driving styles is realized.
Example 1
As shown in fig. 1, in a first aspect, the present invention provides a traffic signal reinforcement learning control method considering a plurality of types of driving styles. The method comprises the following steps:
step one: determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection;
specifically, the overall style types of drivers around the intersection are learned, namely, the number and the type of driving styles are determined. Adopting a Next Generation Simulation (NGSIM) traffic track data set as historical data, selecting 666 vehicles, and extracting 13 vehicle following behavior characterization indexes, namely maximum speed, average speed, speed standard deviation, maximum acceleration, maximum deceleration, average acceleration, acceleration standard deviation, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference, through complete space-time track of each vehicle within 15 seconds; the speed difference refers to the speed difference with the preceding vehicle. These characterization fingers are labeledThe main component analysis method is used for reducing the dimension of the data, and more than 90% of data variance is reserved, so that 2 main component elements are obtained: />. Thereby reducing the data to 2 dimensions.
Then, classifying the vehicles by adopting a K-means cluster analysis method, selecting the most suitable K value by using an elbow rule, wherein the y axis is SSE (Sum of the Squared Errors-sum of squares error), the x axis is the value of K, the SSE is reduced along with the increase of x, and the value of K is taken when the descending amplitude obviously tends to be slow. In the present embodiment, a K value of 3 is obtained, thereby classifying the driving style of the vehicle into 3 categories.
For belonging to the firstClass driving windLattice vehicle, calculating acceleration of vehicle by means of IDM vehicle following model>
Wherein the method comprises the steps ofFor target speed, T is the safe headway, < ->Is the minimum safe distance between vehicles>For maximum acceleration of the vehicle +.>For comfortable deceleration, v is the current speed of the own vehicle,/->The current speed of the front vehicle, d is the distance between the current own vehicle and the front vehicle, and +.>For the middle parameters of the model, +.>Is an acceleration index. Further, for the track data of each type of driving style vehicle, estimating model parameters +.>So that the error between the model output value and the actual value is minimized. A total of 3 different sets of parameters are available: />And rely on each set of diesThe driving style is classified by the values of the type parameters such as the minimum safe distance between vehicles and the maximum acceleration of the vehicle: aggressive, intermediate, conservative. See fig. 3. If the number of the vehicle classifications is larger, a numerical value can be assigned to each class from conservation to the aggressive degree.
Step two: acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle;
in the actual traffic control process, the driving style of the vehicle is acquired in real time, the real-time requirement on the algorithm is high, and the driving style needs to be rapidly identified by adopting a small amount of track data; for rapid recognition of the driving style of a vehicle, a maximum likelihood estimation method is used, i.e. predicted by an IDM modelAnd +.>Comparing, assuming the actual acceleration +>Meets normal distribution, and the average value is +.>,/>Is a normally distributed random variable, +.>Is->Standard deviation of (2):
for each vehicle in the crossing environment, the historical track x of the vehicle can be obtainedAnd calculate the likelihood function value +.>
Wherein n is the number of sampling points, t i The sampling time corresponding to the ith sampling point.
Find the leadMaximum value group +.>IDM model parameter as the vehicle +.>And determining the driving style of the vehicle according to the classification result in the step one.
Through the steps, the rapid identification of the driving style of the intersection is realized, and the two functions are as follows: (1) The driving style of the vehicle is recorded asAs an environment state variable, enriching state information, (2) in traffic simulation, through describing driving style, prediction of vehicle track is more accurate, so that environment can output more accurate rewarding value, and training of reinforcement learning algorithm is facilitated.
Step three: setting a reinforcement learning environment comprising a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;
state space S: for each lane of the four directions of the intersection, starting from the stop lineThe distance of length is divided into equal length cells, each cell having a length +.>. Thus, the state around the intersection can be used +.>A matrix of dimensions. The state space contains 4 +.>A matrix. As shown in fig. 3, matrix 1: representing the position of the vehicle, if the vehicle exists in the cell, the cell is marked as 1, and if the vehicle does not exist, the cell is marked as 0; matrix 2: representing the speed of the vehicle, if a vehicle exists in a cell, recording the speed of the vehicle, and if the vehicle does not exist, recording as 0; matrix 3: representing the driving style of the vehicle, recording the driving style of the vehicle if the vehicle is in the cell>If there is no car, it is marked as 0; matrix 4: representing the state of the signal lamp, if the signal lamp exists in the cell, recording the current state of the signal lamp (the digital red, yellow and green characterization), and if the signal lamp does not exist, recording other numbers. Compared with other reinforcement learning methods, the method integrates the driving style of the vehicle, so that the information of the state space is more abundant, and the control effect is better.
Action space a: after the reinforcement learning agent observes the environment, it needs to select a corresponding action from the action space. Firstly, a green light (G) is defined as passable, a yellow light (Y) represents passable carefully, and a red light (R) represents passable. ESWN and left turn are respectively defined as southeast, northwest and northwestThe curve is L, and then four groups of actions can be selectedWherein NSG represents a north-south traffic light, east-west traffic light, EWG represents a east-west traffic light, north-south traffic light, NSLG represents a north-south left turn priority signal, and EWG represents a east-west left turn priority signal. At time t, the agent can select an action in the action space AIf->And->The same, the phase of the signal is maintained unchanged if +.>And->Different, a corresponding yellow lamp phase needs to be added between the phase transitions.
Bonus function R: the intelligent body is in the observation environmentThen, the corresponding action is required to be selected and executed from the action spaceThe environment needs to return a corresponding reward to the agent. In this method, rewards are defined as vehicle average delaysWherein->For the time the vehicle has been driving, +.>For the distance that the vehicle has travelled, +.>Is the target speed of the vehicle.
Setting a reinforcement learning environment for the reinforcement learning agent; specifically, the reinforcement learning agent adopts a deep Q learning method to select an optimal action. Using deep neural networks:/>To estimate the action-cost function, the first layer input dimension is 4 dimensions of the state space +.>The output dimension is 512, the second layer input dimension is 512, and the output dimension is 4. After training is completed, the environment is +.>Input neural network->Outputting a 4-dimensional matrix, wherein each element corresponds to the action space +.>The score of each action in (a) is selected to be the highest score +.>The optimal action is obtained. In the training process, the value of the action cost function is updated by adopting a Belman equation:
wherein the method comprises the steps ofFor learning rate->For the discount rate of return, ++>Is->Rewards observed by time of day environment, wherein parameters of neural network +.>The training is performed by adopting a gradient descent method until the algorithm converges.
Step four: training the reinforcement learning agent; the fourth step is specifically realized through the following substeps:
and (4.1) training and evaluating verification by adopting an SUMO (speeded up object model) transportation simulation tool auxiliary algorithm, wherein the training and evaluation verification comprises road network construction, signal lamp setting, traffic flow generation, delay calculation and the like.
And (4.2) restoring the road network of the crossing which is to adopt the traffic signal control method in the SUMO simulation tool, wherein the road network comprises the number of lanes of each entrance road and the arrangement method of the lanes, and constructing the traffic signal phase which is consistent with the actual crossing. Further, when traffic flows are generated in the SUMO traffic simulation, the traffic flows are generated according to K styles in the step (1.3), and different IDM model parameters are adopted for each style.
And (4.3) after the simulation starts, the TRACI interface is adopted to interact with the intelligent agent, and a state space, an action space and a reward function in the simulation environment are transmitted to the reinforcement learning intelligent agent, so that the reinforcement learning intelligent agent is helped to complete training. As shown in fig. 4.
Step five: using the trained intelligent body to control traffic; the fifth step is specifically realized through the following substeps:
(5.1) using the trained reinforcement learning agent to control traffic at the actual intersection;
and (5.2) acquiring the position, speed and track information of all vehicles driving to the intersection through vehicle networking. Judging the driving style of each vehicle according to the determined driving style category of the vehicle in the first step; thereby acquiring traffic state in real time;
(5.3) transmitting the traffic state to the reinforcement learning agent, and at each moment, returning the action with the highest score according to the trained action-cost function by the reinforcement learning agent and executing.
In a second aspect, the present invention also provides a traffic signal reinforcement learning control apparatus considering multiple types of driving styles, including:
and an offline clustering module: as shown in fig. 2, for determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection;
the on-line identification module is used for acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle as shown in fig. 2;
the intelligent training module is used for setting a reinforcement learning environment and training the reinforcement learning intelligent; the reinforcement learning environment comprises a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;
and the traffic control module is used for carrying out traffic control by using the trained reinforcement learning intelligent agent.
Corresponding to the embodiment of the traffic signal reinforcement learning control method considering the multi-type driving style, the invention also provides an embodiment of the traffic signal reinforcement learning control device considering the multi-type driving style.
Referring to fig. 5, a traffic signal reinforcement learning control device considering multiple types of driving styles provided in an embodiment of the present invention includes one or more processors for implementing a traffic signal reinforcement learning control method considering multiple types of driving styles in the above embodiment.
An embodiment of the traffic signal reinforcement learning control apparatus considering a plurality of types of driving styles of the present invention may be applied to any device having data processing capability, which may be a device or apparatus such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with data processing capability according to the present invention where a traffic signal reinforcement learning control device considering multiple driving styles is located is shown in fig. 5, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 5, the apparatus with data processing capability according to the present invention in any embodiment generally includes other hardware according to the actual function of the apparatus with data processing capability, which will not be described herein.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a traffic signal reinforcement learning control method that considers multiple types of driving styles in the above-described embodiment.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (8)

1. A traffic signal reinforcement learning control method considering a plurality of types of driving styles, comprising:
determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection; the method comprises the following steps:
acquiring historical track data of a plurality of vehicles within a period of time, extracting characterization indexes, and performing dimension reduction by using a principal component analysis method to obtain a plurality of principal component elements; classifying the vehicles by adopting a K-means cluster analysis method, and determining a K value so as to divide the driving style of the vehicles into K classes; aiming at each type of driving style of vehicle, obtaining K groups of different model parameters by depending on an IDM vehicle following model; and determining the category of the driving style of the vehicle by depending on the numerical value of each group of model parameters;
the characterization indexes comprise maximum speed, average speed, standard deviation of speed, maximum acceleration, maximum deceleration, average acceleration, standard deviation of acceleration, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference; the model parameters comprise target speed, safe headway, minimum safe headway, maximum acceleration and comfortable deceleration of the own vehicle;
acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle; the method comprises the following steps:
for each vehicle in the intersection environment, acquiring the actual acceleration a according to the actual track x thereof real And calculate the likelihood function value L (Θ, x), find a series of IDM vehicle that makes L (Θ, x) take the maximum value and keep the model parameter Θ = { v of following * ,T,d min ,a m ,b comf -determining a driving style of the vehicle as IDM vehicle following model parameters of the vehicle;
wherein v is * For target speed, T is the safe headway, d min A is the minimum safe vehicle distance m Maximum acceleration of the bicycle, b comf Is a comfortable deceleration;
setting a reinforcement learning environment comprising a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;
setting a reinforcement learning environment for training the reinforcement learning agent; the intelligent body is a traffic signal lamp;
and (5) using the trained reinforcement learning intelligent agent to control traffic.
2. The method for controlling reinforcement learning of traffic signals considering multiple types of driving styles according to claim 1, wherein the training of the reinforcement learning agent is specifically:
the vehicle simulation tool is adopted to simulate the reinforcement learning environment, and state space, action space and rewarding function in the simulation environment are transmitted to the reinforcement learning intelligent agent, so that the reinforcement learning intelligent agent is trained.
3. The method of claim 2, wherein the vehicle simulation tool comprises a SUMO simulation tool.
4. The method for controlling reinforcement learning of traffic signals in consideration of multiple types of driving styles according to claim 1, wherein the reinforcement learning agent adopts a deep Q learning method to select an optimal action; the method comprises the following steps: deep neural network η (θ) is used: s->A, estimating an action-cost function, and using a reinforcement learning environment s at the time t t Inputting the neural network eta (theta) and outputting each action a in the action space A t The score of (2) is the highest, namely the optimal action;
in the training process, the motion cost function is updated by adopting a Belman equation:
Q(s t ,a t )=Q(s t ,a t )+α(r t+1 +γmax A Q(s t+1 ,a t )-Q(s t ,a t ))
wherein alpha is learning rate, gamma is discount rate of return, r t+1 And (3) rewarding observed by the environment at the time t+1, wherein θ is a neural network parameter, and S is a state space.
5. The method for controlling traffic signal reinforcement learning taking into account multiple types of driving styles according to claim 1, wherein the traffic control using the trained reinforcement learning agent is specifically:
the method comprises the steps of obtaining position, speed and track information of all vehicles driving to an intersection through vehicle networking, judging driving style of each vehicle, and obtaining traffic state in real time; and transmitting the traffic state to the reinforcement learning intelligent agent, and selecting and executing the optimal action by the reinforcement learning intelligent agent according to the trained action-cost function at each moment.
6. A traffic signal reinforcement learning control device considering multiple types of driving styles is characterized by comprising
And an offline clustering module: the method comprises the steps of determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection; the method comprises the following steps:
acquiring historical track data of a plurality of vehicles within a period of time, extracting characterization indexes, and performing dimension reduction by using a principal component analysis method to obtain a plurality of principal component elements; classifying the vehicles by adopting a K-means cluster analysis method, and determining a K value so as to divide the driving style of the vehicles into K classes; aiming at each type of driving style of vehicle, obtaining K groups of different model parameters by depending on an IDM vehicle following model; and determining the category of the driving style of the vehicle by depending on the numerical value of each group of model parameters;
the characterization indexes comprise maximum speed, average speed, standard deviation of speed, maximum acceleration, maximum deceleration, average acceleration, standard deviation of acceleration, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference; the model parameters comprise target speed, safe headway, minimum safe headway, maximum acceleration and comfortable deceleration of the own vehicle;
the on-line identification module is used for acquiring real-time track data of vehicles around the intersection and acquiring the driving style of the vehicle in real time by combining the determined driving style category of the vehicle; the method comprises the following steps:
for each vehicle in the intersection environment, acquiring the actual acceleration a according to the actual track x thereof real And calculate the likelihood function value L (Θ, x), find a series of IDM vehicle that makes L (Θ, x) take the maximum value and keep the model parameter Θ = { v of following * ,T,d min ,a m ,b comf -determining a driving style of the vehicle as IDM vehicle following model parameters of the vehicle;
wherein v is * For target speed, T is the safe headway, d min A is the minimum safe vehicle distance m Maximum acceleration of the bicycle, b comf Is a comfortable deceleration;
the intelligent training module is used for setting a reinforcement learning environment and training the reinforcement learning intelligent; the reinforcement learning environment comprises a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle; the intelligent body is a traffic signal lamp;
and the traffic control module is used for carrying out traffic control by using the trained reinforcement learning intelligent agent.
7. A traffic signal reinforcement learning control apparatus taking into account a plurality of types of driving styles, characterized by comprising one or more processors for implementing a traffic signal reinforcement learning control method taking into account a plurality of types of driving styles as claimed in any one of claims 1 to 5.
8. A computer-readable storage medium having a program stored thereon, which when executed by a processor, is for implementing a traffic signal reinforcement learning control method taking into account a multi-type driving style as claimed in any one of claims 1 to 5.
CN202311554142.8A 2023-11-21 2023-11-21 Traffic signal reinforcement learning control method and device considering multiple types of driving styles Active CN117275240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311554142.8A CN117275240B (en) 2023-11-21 2023-11-21 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311554142.8A CN117275240B (en) 2023-11-21 2023-11-21 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Publications (2)

Publication Number Publication Date
CN117275240A CN117275240A (en) 2023-12-22
CN117275240B true CN117275240B (en) 2024-02-20

Family

ID=89221916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311554142.8A Active CN117275240B (en) 2023-11-21 2023-11-21 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Country Status (1)

Country Link
CN (1) CN117275240B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013109472A1 (en) * 2012-01-17 2013-07-25 On Time Systems, Inc. Driver safety enhancement using intelligent traffic signals and gps
CN113312752A (en) * 2021-04-26 2021-08-27 东南大学 Traffic simulation method and device for main road priority control intersection
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN114360266A (en) * 2021-12-20 2022-04-15 东南大学 Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN114446049A (en) * 2021-12-29 2022-05-06 北京理工大学 Traffic flow prediction method, system, terminal and medium based on social value orientation
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles
CN115257745A (en) * 2022-07-21 2022-11-01 同济大学 Automatic driving lane change decision control method based on rule fusion reinforcement learning
CN115285135A (en) * 2022-07-14 2022-11-04 湖北汽车工业学院 Construction method of deep reinforcement learning vehicle following model fusing driving style
WO2023123885A1 (en) * 2021-12-31 2023-07-06 上海商汤智能科技有限公司 Traffic signal control method and apparatus, and electronic device, storage medium and program product
CN116946183A (en) * 2023-07-17 2023-10-27 江苏大学 Commercial vehicle driving behavior prediction method considering driving capability and vehicle equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11327492B2 (en) * 2019-12-03 2022-05-10 Mitsubishi Electric Research Laboratories, Inc. Adaptive control of autonomous or semi-autonomous vehicle

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013109472A1 (en) * 2012-01-17 2013-07-25 On Time Systems, Inc. Driver safety enhancement using intelligent traffic signals and gps
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles
CN113312752A (en) * 2021-04-26 2021-08-27 东南大学 Traffic simulation method and device for main road priority control intersection
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN114360266A (en) * 2021-12-20 2022-04-15 东南大学 Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN114446049A (en) * 2021-12-29 2022-05-06 北京理工大学 Traffic flow prediction method, system, terminal and medium based on social value orientation
WO2023123885A1 (en) * 2021-12-31 2023-07-06 上海商汤智能科技有限公司 Traffic signal control method and apparatus, and electronic device, storage medium and program product
CN115285135A (en) * 2022-07-14 2022-11-04 湖北汽车工业学院 Construction method of deep reinforcement learning vehicle following model fusing driving style
CN115257745A (en) * 2022-07-21 2022-11-01 同济大学 Automatic driving lane change decision control method based on rule fusion reinforcement learning
CN116946183A (en) * 2023-07-17 2023-10-27 江苏大学 Commercial vehicle driving behavior prediction method considering driving capability and vehicle equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Junchen Jin.An Agent-Based Traffic Recommendation System: Revisiting and Revising Urban Traffic Management Strategies.IEEE Transactions on Systems, Man, and Cybernetics: Systems.2022,第52卷(第11期),7289-7301. *
吴兵 ; 罗雪 ; 李林波 ; .考虑驾驶风格的模糊控制跟驰模型.同济大学学报(自然科学版).2020,(05),70-77. *

Also Published As

Publication number Publication date
CN117275240A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN111696370B (en) Traffic light control method based on heuristic deep Q network
CN108595823B (en) Autonomous main vehicle lane changing strategy calculation method combining driving style and game theory
US11480972B2 (en) Hybrid reinforcement learning for autonomous driving
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
WO2022052406A1 (en) Automatic driving training method, apparatus and device, and medium
CN112347993B (en) Expressway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation
CN111267830B (en) Hybrid power bus energy management method, device and storage medium
CN114170789B (en) Intelligent network link lane change decision modeling method based on space-time diagram neural network
CN111572562A (en) Automatic driving method, device, equipment, system, vehicle and computer readable storage medium
CN110182217A (en) A kind of traveling task complexity quantitative estimation method towards complicated scene of overtaking other vehicles
CN114162146B (en) Driving strategy model training method and automatic driving control method
CN113554875A (en) Variable speed-limiting control method for heterogeneous traffic flow of expressway based on edge calculation
CN111907523B (en) Vehicle following optimizing control method based on fuzzy reasoning
CN113561995B (en) Automatic driving decision method based on multi-dimensional reward architecture deep Q learning
CN112141098A (en) Obstacle avoidance decision method and device for intelligent driving automobile
Yuan et al. Prioritized experience replay-based deep q learning: Multiple-reward architecture for highway driving decision making
CN110390398B (en) Online learning method
CN117275240B (en) Traffic signal reinforcement learning control method and device considering multiple types of driving styles
CN115973179A (en) Model training method, vehicle control method, device, electronic equipment and vehicle
CN114954498A (en) Reinforced learning lane change behavior planning method and system based on simulated learning initialization
CN115062202A (en) Method, device, equipment and storage medium for predicting driving behavior intention and track
WO2021258847A1 (en) Driving decision-making method, device, and chip
CN115096305A (en) Intelligent driving automobile path planning system and method based on generation of countermeasure network and simulation learning
CN113753049B (en) Social preference-based automatic driving overtaking decision determination method and system
CN117601904B (en) Vehicle running track planning method and device, vehicle and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant