CN117275240B

CN117275240B - Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Info

Publication number: CN117275240B
Application number: CN202311554142.8A
Authority: CN
Inventors: 徐图; 庞钰琪; 李碧清; 曲鑫; 朱永东; 华炜
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-20
Anticipated expiration: 2043-11-21
Also published as: CN117275240A

Abstract

The invention discloses a traffic signal reinforcement learning control method and device considering multiple types of driving styles. Comprising the following steps: determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection; acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle; setting a reinforcement learning environment comprising a state space, an action space and a reward function; training the reinforcement learning agent; and deploying the trained agent at the intersection to realize the reinforcement learning control of the traffic signal. Compared with the traditional traffic signal control method, the method considers the real-time traffic flow and is more intelligent; compared with other reinforcement learning traffic control methods, the method considers multiple types of driving styles and is beneficial to further improving traffic efficiency.

Description

Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Technical Field

The invention relates to the technical field of intelligent traffic, in particular to a traffic signal reinforcement learning control method and device considering multiple types of driving styles.

Background

A large number of researches show that under the environment of complete network automatic driving, the existence of the automatic driving vehicle can improve the traffic flow efficiency and the stability of traffic flow. However, these studies have made ideal assumptions about the maturity of the networking technology, the controllability of the autopilot, and the permeability. Under the condition that the automatic driving vehicle is uncontrollable, a more feasible method is still to optimize a traffic control system (optimizing signals and the like and variable speed limit) at the road side, so that traffic delay is reduced.

In recent years, with the deep integration of new technologies such as big data, internet of vehicles, artificial intelligence, etc. with the traffic industry, traffic control research has presented a trend of transition from traditional traffic engineering methods to artificial intelligence methods represented by reinforcement learning. The prior scholars prove that the traffic efficiency can be improved by using the traffic signal control method based on reinforcement learning at the intersection, however, in the traffic flow of the hybrid driving of the automatic/manual driving vehicle in the future, a large number of different driving styles exist, the driving styles of people are various and have strong randomness, the automatic driving vehicle adopts a deterministic algorithm, but different vehicle brands exist in the traffic flow, and the automatic driving vehicle of the same brand also has different driving modes. Multiple types of driving styles result in uncontrollable optimization algorithm effects. In view of the above difficulties, there is a need to devise a traffic control optimization method that considers multiple types of driving styles. Thereby promoting the industrialization application of autopilot, and promoting the development of emerging industries such as intelligent transportation, the internet of things and the like.

Therefore, the method considers the track information of all vehicles around the intersection under the network automatic driving environment, realizes the rapid classification of the driving style, takes the driving style of the vehicle, the information such as the road junction topology, the vehicle position and the like as the state variable in the reinforcement learning environment, and further realizes the reinforcement learning-based traffic signal control method considering multiple driving styles.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a traffic signal reinforcement learning control method and device considering multiple types of driving styles.

The aim of the invention is realized by the following technical scheme: a traffic signal reinforcement learning control method considering a plurality of types of driving styles, comprising:

determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection;

acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle;

setting a reinforcement learning environment comprising a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;

training the reinforcement learning agent;

and (5) using the trained reinforcement learning intelligent agent to control traffic.

Further, the determining the category of the driving style of the vehicle based on the historical track data of the vehicles around the intersection comprises:

acquiring space-time track data of a plurality of vehicles within a period of time, extracting characterization indexes, and performing dimension reduction by using a principal component analysis method to obtain a plurality of principal component elements; classifying the vehicles by adopting a K-means cluster analysis method, and determining a K value so as to divide the driving style of the vehicles into K classes; aiming at each type of driving style of vehicle, obtaining K groups of different model parameters by depending on an IDM vehicle following model; and determining the category of the driving style of the vehicle by depending on the numerical value of each group of model parameters;

the characterization indexes comprise maximum speed, average speed, standard deviation of speed, maximum acceleration, maximum deceleration, average acceleration, standard deviation of acceleration, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference; the model parameters include target speed, safe headway, minimum safe headway, maximum acceleration of the own vehicle and comfortable deceleration.

Further, the acquiring the real-time track data of the vehicles around the intersection, and the acquiring the vehicle driving style in real time in combination with the determined vehicle driving style category includes:

for each vehicle in the intersection environment, acquiring actual acceleration according to the historical track x of the vehicleAnd calculate the likelihood function value +.>Find to make->A set of model parameters with the maximum value +.>As an IDM vehicle following model parameter of the vehicle, determining a driving style of the vehicle;

wherein,for target speed, T is the safe headway, < ->Is the minimum safe distance between vehicles>For maximum acceleration of the vehicle +.>Is a comfortable deceleration.

Further, the training of the reinforcement learning agent includes:

the vehicle simulation tool is adopted to simulate the reinforcement learning environment, and state space, action space and rewarding function in the simulation environment are transmitted to the reinforcement learning intelligent agent, so that the reinforcement learning intelligent agent is trained.

Further, the vehicle simulation tool includes a SUMO simulation tool.

Further, the reinforcement learning agent adopts a deep Q learning method to select the optimal action; the method comprises the following steps: using deep neural networks：/>To estimate the action-cost function, the reinforcement learning environment at time t is->Input neural network->Output every action in action space A>The score of (2) is the highest, namely the optimal action;

in the training process, the motion cost function is updated by adopting a Belman equation:

wherein,for learning rate->For the discount rate of return, ++>Is->Rewards observed by the time of day environment, +.>Is a neural network parameter, and S is a state space.

Further, the performing traffic control using the trained reinforcement learning agent includes:

the method comprises the steps of obtaining position, speed and track information of all vehicles driving to an intersection through vehicle networking, judging driving style of each vehicle, and obtaining traffic state in real time; and transmitting the traffic state to the reinforcement learning intelligent agent, and selecting and executing the optimal action by the reinforcement learning intelligent agent according to the trained action-cost function at each moment.

A traffic signal reinforcement learning control device considering multiple types of driving styles comprises

And an offline clustering module: the method comprises the steps of determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection;

the on-line identification module is used for acquiring real-time track data of vehicles around the intersection and acquiring the driving style of the vehicle in real time by combining the determined driving style category of the vehicle;

the intelligent training module is used for setting a reinforcement learning environment and training the reinforcement learning intelligent; the reinforcement learning environment comprises a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;

and the traffic control module is used for carrying out traffic control by using the trained reinforcement learning intelligent agent.

A traffic signal reinforcement learning control device considering multiple types of driving styles comprises one or more processors, and is used for realizing the traffic signal reinforcement learning control method considering multiple types of driving styles.

A computer-readable storage medium having stored thereon a program which, when executed by a processor, is adapted to implement a traffic signal reinforcement learning control method that takes into account multiple types of driving styles as described above.

The beneficial effects of the invention are as follows: compared with the traditional traffic signal control method, the method considers the real-time traffic flow and is more intelligent; compared with the traditional reinforcement learning algorithm, the method adds the driving style of the vehicle into the state variable of the reinforcement learning environment, so that the state information of the reinforcement learning algorithm is more abundant, the training effect of the algorithm is improved, and the traffic flow efficiency of the intersection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a schematic flow chart of a traffic signal reinforcement learning control method considering multiple driving styles according to an embodiment of the present invention;

FIG. 2 is a schematic flow diagram of an offline clustering module and an online recognition module;

FIG. 3 is a schematic diagram of reinforcement learning environment state variable settings;

FIG. 4 is a schematic diagram of a reinforcement learning agent training and SUMO simulation interaction method;

fig. 5 is a hardware configuration diagram provided in an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

According to the method, under the network-connected automatic driving environment, the rapid classification of driving styles is realized by depending on the track information of all vehicles around the intersection, and the driving styles of the vehicles, the information such as the road junction topology, the vehicle positions and the like are used as state variables in the reinforcement learning environment, so that the reinforcement learning-based traffic signal control method considering multiple driving styles is realized.

Example 1

As shown in fig. 1, in a first aspect, the present invention provides a traffic signal reinforcement learning control method considering a plurality of types of driving styles. The method comprises the following steps:

step one: determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection;

specifically, the overall style types of drivers around the intersection are learned, namely, the number and the type of driving styles are determined. Adopting a Next Generation Simulation (NGSIM) traffic track data set as historical data, selecting 666 vehicles, and extracting 13 vehicle following behavior characterization indexes, namely maximum speed, average speed, speed standard deviation, maximum acceleration, maximum deceleration, average acceleration, acceleration standard deviation, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference, through complete space-time track of each vehicle within 15 seconds; the speed difference refers to the speed difference with the preceding vehicle. These characterization fingers are labeledThe main component analysis method is used for reducing the dimension of the data, and more than 90% of data variance is reserved, so that 2 main component elements are obtained: />. Thereby reducing the data to 2 dimensions.

Then, classifying the vehicles by adopting a K-means cluster analysis method, selecting the most suitable K value by using an elbow rule, wherein the y axis is SSE (Sum of the Squared Errors-sum of squares error), the x axis is the value of K, the SSE is reduced along with the increase of x, and the value of K is taken when the descending amplitude obviously tends to be slow. In the present embodiment, a K value of 3 is obtained, thereby classifying the driving style of the vehicle into 3 categories.

For belonging to the firstClass driving windLattice vehicle, calculating acceleration of vehicle by means of IDM vehicle following model>：

Wherein the method comprises the steps ofFor target speed, T is the safe headway, < ->Is the minimum safe distance between vehicles>For maximum acceleration of the vehicle +.>For comfortable deceleration, v is the current speed of the own vehicle,/->The current speed of the front vehicle, d is the distance between the current own vehicle and the front vehicle, and +.>For the middle parameters of the model, +.>Is an acceleration index. Further, for the track data of each type of driving style vehicle, estimating model parameters +.>So that the error between the model output value and the actual value is minimized. A total of 3 different sets of parameters are available: />And rely on each set of diesThe driving style is classified by the values of the type parameters such as the minimum safe distance between vehicles and the maximum acceleration of the vehicle: aggressive, intermediate, conservative. See fig. 3. If the number of the vehicle classifications is larger, a numerical value can be assigned to each class from conservation to the aggressive degree.

Step two: acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle;

in the actual traffic control process, the driving style of the vehicle is acquired in real time, the real-time requirement on the algorithm is high, and the driving style needs to be rapidly identified by adopting a small amount of track data; for rapid recognition of the driving style of a vehicle, a maximum likelihood estimation method is used, i.e. predicted by an IDM modelAnd +.>Comparing, assuming the actual acceleration +>Meets normal distribution, and the average value is +.>，/>Is a normally distributed random variable, +.>Is->Standard deviation of (2):

for each vehicle in the crossing environment, the historical track x of the vehicle can be obtainedAnd calculate the likelihood function value +.>：

Wherein n is the number of sampling points, t _i The sampling time corresponding to the ith sampling point.

Find the leadMaximum value group +.>IDM model parameter as the vehicle +.>And determining the driving style of the vehicle according to the classification result in the step one.

Through the steps, the rapid identification of the driving style of the intersection is realized, and the two functions are as follows: (1) The driving style of the vehicle is recorded asAs an environment state variable, enriching state information, (2) in traffic simulation, through describing driving style, prediction of vehicle track is more accurate, so that environment can output more accurate rewarding value, and training of reinforcement learning algorithm is facilitated.

Step three: setting a reinforcement learning environment comprising a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle;

state space S: for each lane of the four directions of the intersection, starting from the stop lineThe distance of length is divided into equal length cells, each cell having a length +.>. Thus, the state around the intersection can be used +.>A matrix of dimensions. The state space contains 4 +.>A matrix. As shown in fig. 3, matrix 1: representing the position of the vehicle, if the vehicle exists in the cell, the cell is marked as 1, and if the vehicle does not exist, the cell is marked as 0; matrix 2: representing the speed of the vehicle, if a vehicle exists in a cell, recording the speed of the vehicle, and if the vehicle does not exist, recording as 0; matrix 3: representing the driving style of the vehicle, recording the driving style of the vehicle if the vehicle is in the cell>If there is no car, it is marked as 0; matrix 4: representing the state of the signal lamp, if the signal lamp exists in the cell, recording the current state of the signal lamp (the digital red, yellow and green characterization), and if the signal lamp does not exist, recording other numbers. Compared with other reinforcement learning methods, the method integrates the driving style of the vehicle, so that the information of the state space is more abundant, and the control effect is better.

Action space a: after the reinforcement learning agent observes the environment, it needs to select a corresponding action from the action space. Firstly, a green light (G) is defined as passable, a yellow light (Y) represents passable carefully, and a red light (R) represents passable. ESWN and left turn are respectively defined as southeast, northwest and northwestThe curve is L, and then four groups of actions can be selectedWherein NSG represents a north-south traffic light, east-west traffic light, EWG represents a east-west traffic light, north-south traffic light, NSLG represents a north-south left turn priority signal, and EWG represents a east-west left turn priority signal. At time t, the agent can select an action in the action space AIf->And->The same, the phase of the signal is maintained unchanged if +.>And->Different, a corresponding yellow lamp phase needs to be added between the phase transitions.

Bonus function R: the intelligent body is in the observation environmentThen, the corresponding action is required to be selected and executed from the action spaceThe environment needs to return a corresponding reward to the agent. In this method, rewards are defined as vehicle average delaysWherein->For the time the vehicle has been driving, +.>For the distance that the vehicle has travelled, +.>Is the target speed of the vehicle.

Setting a reinforcement learning environment for the reinforcement learning agent; specifically, the reinforcement learning agent adopts a deep Q learning method to select an optimal action. Using deep neural networks：/>To estimate the action-cost function, the first layer input dimension is 4 dimensions of the state space +.>The output dimension is 512, the second layer input dimension is 512, and the output dimension is 4. After training is completed, the environment is +.>Input neural network->Outputting a 4-dimensional matrix, wherein each element corresponds to the action space +.>The score of each action in (a) is selected to be the highest score +.>The optimal action is obtained. In the training process, the value of the action cost function is updated by adopting a Belman equation:

wherein the method comprises the steps ofFor learning rate->For the discount rate of return, ++>Is->Rewards observed by time of day environment, wherein parameters of neural network +.>The training is performed by adopting a gradient descent method until the algorithm converges.

Step four: training the reinforcement learning agent; the fourth step is specifically realized through the following substeps:

and (4.1) training and evaluating verification by adopting an SUMO (speeded up object model) transportation simulation tool auxiliary algorithm, wherein the training and evaluation verification comprises road network construction, signal lamp setting, traffic flow generation, delay calculation and the like.

And (4.2) restoring the road network of the crossing which is to adopt the traffic signal control method in the SUMO simulation tool, wherein the road network comprises the number of lanes of each entrance road and the arrangement method of the lanes, and constructing the traffic signal phase which is consistent with the actual crossing. Further, when traffic flows are generated in the SUMO traffic simulation, the traffic flows are generated according to K styles in the step (1.3), and different IDM model parameters are adopted for each style.

And (4.3) after the simulation starts, the TRACI interface is adopted to interact with the intelligent agent, and a state space, an action space and a reward function in the simulation environment are transmitted to the reinforcement learning intelligent agent, so that the reinforcement learning intelligent agent is helped to complete training. As shown in fig. 4.

Step five: using the trained intelligent body to control traffic; the fifth step is specifically realized through the following substeps:

(5.1) using the trained reinforcement learning agent to control traffic at the actual intersection;

and (5.2) acquiring the position, speed and track information of all vehicles driving to the intersection through vehicle networking. Judging the driving style of each vehicle according to the determined driving style category of the vehicle in the first step; thereby acquiring traffic state in real time;

(5.3) transmitting the traffic state to the reinforcement learning agent, and at each moment, returning the action with the highest score according to the trained action-cost function by the reinforcement learning agent and executing.

In a second aspect, the present invention also provides a traffic signal reinforcement learning control apparatus considering multiple types of driving styles, including:

and an offline clustering module: as shown in fig. 2, for determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection;

the on-line identification module is used for acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle as shown in fig. 2;

Corresponding to the embodiment of the traffic signal reinforcement learning control method considering the multi-type driving style, the invention also provides an embodiment of the traffic signal reinforcement learning control device considering the multi-type driving style.

Referring to fig. 5, a traffic signal reinforcement learning control device considering multiple types of driving styles provided in an embodiment of the present invention includes one or more processors for implementing a traffic signal reinforcement learning control method considering multiple types of driving styles in the above embodiment.

An embodiment of the traffic signal reinforcement learning control apparatus considering a plurality of types of driving styles of the present invention may be applied to any device having data processing capability, which may be a device or apparatus such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with data processing capability according to the present invention where a traffic signal reinforcement learning control device considering multiple driving styles is located is shown in fig. 5, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 5, the apparatus with data processing capability according to the present invention in any embodiment generally includes other hardware according to the actual function of the apparatus with data processing capability, which will not be described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a traffic signal reinforcement learning control method that considers multiple types of driving styles in the above-described embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims

1. A traffic signal reinforcement learning control method considering a plurality of types of driving styles, comprising:

determining a category of a driving style of the vehicle based on historical track data of vehicles around the intersection; the method comprises the following steps:

acquiring historical track data of a plurality of vehicles within a period of time, extracting characterization indexes, and performing dimension reduction by using a principal component analysis method to obtain a plurality of principal component elements; classifying the vehicles by adopting a K-means cluster analysis method, and determining a K value so as to divide the driving style of the vehicles into K classes; aiming at each type of driving style of vehicle, obtaining K groups of different model parameters by depending on an IDM vehicle following model; and determining the category of the driving style of the vehicle by depending on the numerical value of each group of model parameters;

the characterization indexes comprise maximum speed, average speed, standard deviation of speed, maximum acceleration, maximum deceleration, average acceleration, standard deviation of acceleration, maximum following interval, minimum following interval, average following interval, standard deviation of following interval, average speed difference and standard deviation of speed difference; the model parameters comprise target speed, safe headway, minimum safe headway, maximum acceleration and comfortable deceleration of the own vehicle;

acquiring real-time track data of vehicles around the intersection, and acquiring the driving style of the vehicle in real time by combining the determined driving style types of the vehicle; the method comprises the following steps:

for each vehicle in the intersection environment, acquiring the actual acceleration a according to the actual track x thereof _real And calculate the likelihood function value L (Θ, x), find a series of IDM vehicle that makes L (Θ, x) take the maximum value and keep the model parameter Θ = { v of following ^* ,T,d _min ,a _m ,b _comf -determining a driving style of the vehicle as IDM vehicle following model parameters of the vehicle;

wherein v is ^* For target speed, T is the safe headway, d _min A is the minimum safe vehicle distance _m Maximum acceleration of the bicycle, b _comf Is a comfortable deceleration;

setting a reinforcement learning environment for training the reinforcement learning agent; the intelligent body is a traffic signal lamp;

2. The method for controlling reinforcement learning of traffic signals considering multiple types of driving styles according to claim 1, wherein the training of the reinforcement learning agent is specifically:

3. The method of claim 2, wherein the vehicle simulation tool comprises a SUMO simulation tool.

4. The method for controlling reinforcement learning of traffic signals in consideration of multiple types of driving styles according to claim 1, wherein the reinforcement learning agent adopts a deep Q learning method to select an optimal action; the method comprises the following steps: deep neural network η (θ) is used: s->A, estimating an action-cost function, and using a reinforcement learning environment s at the time t _t Inputting the neural network eta (theta) and outputting each action a in the action space A _t The score of (2) is the highest, namely the optimal action;

Q(s _t ,a _t )＝Q(s _t ,a _t )+α(r _t+1 +γmax _A Q(s _t+1 ,a _t )-Q(s _t ,a _t ))

wherein alpha is learning rate, gamma is discount rate of return, r _t+1 And (3) rewarding observed by the environment at the time t+1, wherein θ is a neural network parameter, and S is a state space.

5. The method for controlling traffic signal reinforcement learning taking into account multiple types of driving styles according to claim 1, wherein the traffic control using the trained reinforcement learning agent is specifically:

6. A traffic signal reinforcement learning control device considering multiple types of driving styles is characterized by comprising

And an offline clustering module: the method comprises the steps of determining a category of a driving style of a vehicle based on historical track data of vehicles around an intersection; the method comprises the following steps:

the on-line identification module is used for acquiring real-time track data of vehicles around the intersection and acquiring the driving style of the vehicle in real time by combining the determined driving style category of the vehicle; the method comprises the following steps:

the intelligent training module is used for setting a reinforcement learning environment and training the reinforcement learning intelligent; the reinforcement learning environment comprises a state space, an action space and a reward function; the state space is used for representing the position of the vehicle, the speed of the vehicle, the driving style of the vehicle and the state of the signal lamp; the action space is used for representing actions of the reinforcement learning intelligent agent at the intersection; the reward function is used for representing the average delay of the vehicle; the intelligent body is a traffic signal lamp;

7. A traffic signal reinforcement learning control apparatus taking into account a plurality of types of driving styles, characterized by comprising one or more processors for implementing a traffic signal reinforcement learning control method taking into account a plurality of types of driving styles as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium having a program stored thereon, which when executed by a processor, is for implementing a traffic signal reinforcement learning control method taking into account a multi-type driving style as claimed in any one of claims 1 to 5.