CN111258314B

CN111258314B - Collaborative evolution-based decision-making emergence method for automatic driving vehicle

Info

Publication number: CN111258314B
Application number: CN202010065627.0A
Authority: CN
Inventors: 刘章杰; 李慧云
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2022-07-15
Anticipated expiration: 2040-01-20
Also published as: CN111258314A

Abstract

The invention provides an automatic driving vehicle decision emerging method based on co-evolution. The method comprises the following steps: constructing a road model, and determining the initial position distribution and the driving destination of a vehicle in the road model; setting a plurality of candidate driving strategies for limiting the driving speed of the vehicle and the interactive relation between the front vehicle and the rear vehicle, and dividing the vehicle into different populations according to the candidate driving strategies; operating the vehicle according to the vehicle kinematics model, a preset traffic light scheduling strategy and a candidate driving strategy corresponding to the vehicle; and exploring the advantages and disadvantages of the candidate driving strategies by taking a plurality of driving indexes of the control vehicle as optimization targets. The invention aims at the optimization of a plurality of global targets, so that the optimal driving strategy can emerge spontaneously.

Description

Collaborative evolution-based decision emerging method for automatic driving vehicle

Technical Field

The invention relates to the technical field of automatic driving, in particular to a decision-making emerging method of an automatic driving vehicle based on collaborative evolution.

Background

At present, the urban traffic has the characteristics of large scale, multiple trends and large change, so that a traffic transportation system and traffic rules can hardly deal with instantly changing traffic conditions, and a single vehicle can not be reasonably planned and scheduled in time.

In the prior art, the centralized scheduling method is generally suitable for the small-scale situation. For example, a centralized dispatch model may address the optimality of traffic light dispatch, improve traffic throughput, and reduce the latency at intersections. However, the computational complexity of centralized optimization algorithms (including reinforcement learning, neural networks, fuzzy logic, etc.) will grow exponentially as the number of intersections, the number of vehicles, and the length of roads increase. Moreover, the dynamic and transient nature of the urban vehicle network makes it difficult to respond to individual vehicles and various traffic conditions in a timely manner through a centralized optimal scheduling algorithm.

The current evolutionary algorithm (such as genetic algorithm and the like) and some intelligent optimization algorithms (such as ant colony algorithm, particle swarm algorithm, simulated annealing algorithm and the like) are effective to large-scale problems with inherent feature distribution. However, these traditional evolutionary algorithms fail to combine the effects of competition and cooperation between individuals and environments, which become more apparent and prominent as V2X technology evolves between individual vehicles and environments.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an automatic driving vehicle decision emerging method based on co-evolution, which observes the emergence of optimal driving strategies under different environments by simulating the co-evolution process of competition and cooperation.

According to a first aspect of the invention, a co-evolution based autonomous vehicle decision making emergence method is provided. The method comprises the following steps:

constructing a road model, and determining the initial position distribution and the driving destination of a vehicle in the road model;

setting a plurality of candidate driving strategies for limiting the driving speed of the vehicle and the interactive relation between the front vehicle and the rear vehicle, and dividing the vehicle into different populations according to the candidate driving strategies;

operating the vehicle according to the vehicle kinematics model, a preset traffic light scheduling strategy and a candidate driving strategy corresponding to the vehicle;

and exploring the advantages and disadvantages of the candidate driving strategies by taking a plurality of driving indexes of the controlled vehicle as an optimization target.

In one embodiment, the constructing the road model and setting the initial position distribution and the driving destination of the vehicle in the road model comprises:

constructing a bidirectional four-lane road model with an intersection, wherein the road is composed of grids, each grid is a rectangle, and the length of each grid is set to be equal to that of a vehicle;

the initial distribution of vehicles and the vehicle destinations will be randomly set, and when the vehicle center point falls within the grid of the road model, the grid is considered occupied.

In one embodiment, the plurality of candidate driving strategies includes:

conservative strategies: the speed-reducing device is used for limiting the vehicle to run at the maximum speed, and when the speed of the front vehicle is lower, the vehicle is decelerated and does not need to overtake;

rational strategy: for limiting the travel at maximum speed, overtaking when the speed of the preceding vehicle is slow and no vehicle is merging in the area of 20 meters relevant to lane change;

greedy strategy: for limiting travel at maximum speed, always overtaking.

In one embodiment, the traffic signal light scheduling policy is a round robin rotation of fixed time slices, each time slice being 8 seconds.

In one embodiment, the plurality of driving indicators includes at least two of average transit time, accident rate, average emissions, and average energy consumption.

According to a second aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, performs the steps of the above-described method of the invention.

According to a third aspect of the invention, there is provided a computer device comprising a memory and a processor, a computer program being stored on the memory and being executable on the processor, the processor implementing the steps of the method of the invention as described above when executing the program.

Compared with the prior art, the invention has the advantages that: aiming at the problem of large-scale global vehicle optimization, a co-evolution method is introduced, a plurality of objects for global optimization such as average speed, accident rate and the like are realized through cooperation and competition, and the optimal driving strategy is spontaneously emerged through the optimization of a plurality of global targets.

Drawings

The invention is illustrated in the following drawings, which are only schematic and explanatory and are not restrictive of the invention, and wherein:

FIG. 1 is a flow diagram of a collaborative evolution based autonomous vehicle decision-making emergence method according to one embodiment of the present invention;

FIG. 2 is a schematic view of a road model according to one embodiment of the invention;

FIG. 3 is a diagram illustrating a normal distribution neighborhood, according to an embodiment of the present invention;

FIG. 4 is a flow chart of an implementation of a co-evolution based autonomous vehicle decision making emergence method according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of a road environment according to one embodiment of the present invention;

FIG. 6 is a diagram illustrating the trend of population according to the number of generations of reproduction according to an embodiment of the present invention;

FIG. 7 is a schematic illustration of a population incident count according to an embodiment of the present invention;

FIG. 8 is a schematic illustration of average velocity according to one embodiment of the present invention;

in the figure, Lane-Lane; center point-Center point; Vehicle-Vehicle; a Conservative policy-Conservative strategy; a radial policy-greedy strategy; rational policy-Rational strategy; generation; ratio of the subgroup population-offspring population Ratio; number of entries-Number of Accidents; average speed-Average speed.

Detailed Description

In order to make the objects, technical solutions, design methods, and advantages of the present invention more apparent, the present invention will be further described in detail by specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

The invention provides a multi-target co-evolution method for large-scale automatic driving vehicle clusters so as to research which driving strategy will achieve overall good performance in the long run. In the invention, firstly, a driving strategy of a large-scale city automatic driving automobile is formulated into a multi-objective optimization problem. Then, a grid road model and a vehicle dynamics model are established, wherein the vehicles indirectly interact with each other, thereby reducing the interaction complexity from O (n |) (n being the number of vehicles to be interacted) to O (n). The invention simulates the coevolution process of competition and cooperation to observe the emergence of the optimal driving strategy in different environments.

Through research, the current centralized optimization algorithm has too high computational complexity and cannot process large-scale traffic optimization problems. On the other hand, the traditional evolutionary algorithm and the population intelligent algorithm only comprise a single population, which is not enough for diversified environments of competition and cooperation. Aiming at the problems, the invention provides a method for combining co-evolution with group intelligence so as to construct a plurality of cooperative competition groups, co-evolve in a vehicle-road cooperative network and observe the emergence of an optimal driving strategy.

Referring to fig. 1, the decision making emerging method of the autonomous driving vehicle based on co-evolution provided by the embodiment of the invention comprises the following steps:

and step S110, constructing a vehicle model and a vehicle kinematics model.

Referring to FIG. 2, the vehicle is modeled as a rectangle of length/and width w, denoted as c_kThe center point of the vehicle is represented as (x)_k，t，y_k，t) Namely, the following steps are provided:

c_k，t:(x_k，t，y_k，t) (1)

wherein, c_k，tRepresenting the geometric centre of the kth vehicle at time t, x_k，tAbscissa, y, representing the geometric center of the kth vehicle at time t_k，tThe ordinate of the geometric center of the kth vehicle at time t is shown.

In the present invention, a vehicle kinematics model is introduced to describe the behavior of the vehicle, in particular how it affects the reality and promotes competition and synergy between the populations.

Specifically, the lane change is represented as:

x_t+1＝x_t+v_t+1*cos(α_t+θ_t)*Δt (3)

y_t+1＝y_t+v_t+1*sin(α_t+θ_t)*Δt (4)

x_t+2＝x_t+1+v_t+1*cos(α_t)*Δt (5)

y_t+2＝y_t+1+v_t+1*sin(α_t)*Δt (6)

the turn is represented as:

x_t+1＝x_t+v_t+1*cos(α_t+θ_t)*Δt (8)

y_t+1＝y_t+v_t+1*sin(α_t+θ_t)*Δt (9)

x_t+2＝x_t+1+v_t+1*cos(α_t+θ_t+θ_t+1)*Δt (10)

y_t+2＝y_t+1+v_t+1*sin(α_t+θ_t+θ_t+1)*Δt (11)

the intersection deceleration is represented as:

wherein:

v_trepresenting the speed of the vehicle at time t;

represents the acceleration of the vehicle at time t;

Δ t represents a time step;

x_tan abscissa representing the geometric center of the vehicle at time t;

y_tindicating the time t of the vehicleThe ordinate of the geometric center;

α_trepresenting the vehicle attitude angle at the time t of the vehicle;

θ_tindicating the steering wheel angle at the moment t of the vehicle;

x_can abscissa representing the geometric center of the traffic light;

y_can ordinate representing the geometric center of the traffic light;

d_tindicating the distance of the vehicle from the traffic light at the moment t;

and step S120, constructing a road model.

In one embodiment, the invention builds a two-way four-lane road model with an intersection. The road is made up of grids, each of which is a rectangle, with the same width gw and length gl. Grid gr in ith row and jth column in road network_i,jHas four coordinates of

The area within the grid may be described as:

the grid length gl is set equal to the vehicle length so that there will be no more than one vehicle with its center point in a single grid. Still referring to FIG. 2, vehicle A is centered on a grid gr_3,2In, the center of the vehicle B is located on another grid gr_3,3In (1). Then, the state g is defined in equation (15)_i,j(t) binary state. In fig. 2, when the vehicle center point falls in the grid, the grid is occupied and its state is 1, and when there is no vehicle inside, the state is 0, the grid occupancy state is represented as:

with the road model and vehicle model constructed, a single vehicle only interacts indirectly with neighboring vehicles. Thus, the interaction complexity is increased from n: n (n represents the number of vehicles) is reduced to 1: 1, the complexity of the model is greatly reduced, and the expandability is enhanced.

And S130, setting a multi-objective optimization problem for the vehicle cluster.

Equation (16) sets forth the multi-objective optimization problem for large-scale urban vehicle clusters, where f₁(x)＝∑_i∑_jg_i，j(t)*g_i，j+1(t) indicates that two consecutive grids are both occupied, g_i，j(t)*g_i，j+1(t) will be 1 and f₂(x) -E (v (t)) represents the overall average speed of all vehicles.

Such that:

f₁(x)＝∑_i∑_jg_i，j(t)*g_i，j+1(t) (17)

f₂(x)＝-E(v(t)) (18)

α_k，t＝α_k，t-1+θ_k，t (23)

v_k，t≤v_max (24)

θ_k，t≤θ_max (26)

wherein:

representing the velocity of the kth vehicle at time t;

v_k，trepresenting the velocity of the kth vehicle at time t;

α_k，tthe attitude angle of the kth vehicle at the time t is shown;

θ_k，tindicates the steering wheel angle at the time of the kth vehicle t;

represents the acceleration of the kth vehicle at time t;

Δ t represents a time step;

i | · | | represents a two-norm.

And step S140, setting a candidate driving strategy set to limit the running speed of the vehicle and the interaction relationship between the front vehicle and the rear vehicle.

The driving strategy is used for limiting the driving condition of the vehicle and the interaction relation with the surrounding vehicles, and a plurality of candidate driving strategies can be set. For example, the set of candidate driving strategies includes three driving strategies, which are conservative strategies: the vehicle runs at the maximum speed, and decelerates when the speed of the current vehicle is lower, so that overtaking is not needed; rational strategy: driving at the maximum speed, overtaking when the speed of the front vehicle is slow and no vehicle enters a certain area (for example, 20 meters) related to lane change; greedy strategy: the vehicle is driven at the highest speed and always overtakes.

Further, the vehicles may be divided into different populations according to different candidate driving strategies.

And S150, operating the vehicle according to the vehicle kinematic model, a preset traffic signal lamp scheduling strategy and a candidate driving strategy corresponding to the vehicle to explore a candidate driving strategy set.

Specifically, in one embodiment, the step of exploring the set of candidate driving strategies is as follows:

step S1, initializing road and vehicle populations.

For example, the number of vehicles and the initial distribution area are set, and the initial positions of the vehicles and the destinations of the vehicles and the like are randomly set.

And step S2, running the vehicle and traffic light simulation according to the vehicle kinematics model until all vehicles reach the destination. The road environment (i.e. the grid state) is updated at each time step.

For example, the traffic light scheduling adopts a fixed time slice rotation algorithm, each time slice is 8 seconds or 10 seconds, and the like.

At step S3, the status of individual vehicles, such as speed and accident rate, is recorded and the vehicles are then evaluated according to the following reward function:

wherein:

r (k) represents a reward function for the k-th vehicle;

eta represents the number of accidents caused by the kth vehicle;

T_krepresenting the life cycle of the kth vehicle;

representing the speed of the kth vehicle;

δ_vstandard deviation representing the speed of all vehicles;

β₁,β₂representing weights of the items;

and step S4, calculating the average value mu and the standard deviation delta of the overall score of the vehicle, determining the sub-algebra of each individual according to the offspring rules, and starting the next generation of simulation operation after the multiplication is finished.

In one embodiment, as shown in connection with FIG. 3, the spawning rules for the descendants are set to:

1) the vehicle score falls within two standard deviation neighborhoods (2 sigma) on the right side of the mean or on the right side of the region, and two offspring are bred;

2) the vehicle score falls within a standard deviation neighborhood (1 sigma) on the right side of the mean value, and two offspring are bred;

3) vehicle scores fell to the left of the mean with no offspring.

It should be noted that, in addition to considering the average transit time and the accident rate, the design of the reward function may also consider factors such as average emission and average energy consumption.

Step S5, terminating when the population distribution tends to converge.

It is to be understood that the above-mentioned vehicle model, road model, vehicle kinematics model, traffic light scheduling, etc. are only used for explaining the present invention, and those skilled in the art can make appropriate modifications or variations to the above-mentioned models or parameters without departing from the spirit of the present invention, for example, adopting dynamic traffic light scheduling, designing other road models according to the real lane conditions, etc.

To further verify the effect of the present invention, the experimental environment is constructed according to the model and algorithm set by the present invention as follows (the implementation process can be seen in fig. 4):

(1) road model

In urban traffic, a representative traffic scenario is an intersection, which contains most elements of urban driving behavior, including lane changes, turns, overtaking, speeding, etc. On the premise of no loss of generality, a bidirectional four-lane road model is established, as shown in fig. 5. Each section of the road is 1000 metres long and each lane is 3.75 metres wide, this arrangement being in accordance with road construction standards.

(2) Vehicle with a steering wheel

The total number of vehicles under test was set to 400, initially randomly distributed within the area described by the road model, and then divided into multiple populations according to different candidate driving strategies. The destination of each vehicle is set randomly. In the life cycle, each vehicle interacts with other individuals and roads according to its own driving strategy, thereby influencing the other individuals and promoting overall development.

(3) Traffic signal light dispatching

Traffic lights use the most common fixed time slice rotation algorithm in urban traffic, with 8 seconds per time slice.

(4) Candidate driving strategies

Candidate driving strategies include conservative strategies, rational strategies, greedy strategies, etc., other types of driving strategies may also be set.

See table 1 below for specific experimental parameters.

TABLE 1 Experimental parameters

In the simulated environment developed locally using python, an experiment was performed with 400 cars on a 2 km long road.

Through the co-evolution and the breeding rules of superior and inferior, the population distribution of different sub-populations (adopting different driving strategies) is changed remarkably. The major and minor phases appeared after 30 generations. As shown in fig. 6 to 8, the experimental results show that: grouping with conservative driving strategies is worst in performance, and average speed is lowest, so that the population scale is reduced rapidly; in all accident rates, the sub-population adopting the rational strategy has the best performance, and the average speed is almost different from the highest accident rate, so the population scale is rapidly increased; the sub-population adopting the aggressive strategy has the highest accident rate and the average speed is slightly higher. The research result is instructive to the design driving policy of the intelligent traffic system and the intelligent vehicle in the future.

On a MacBook Pro with a processor of 2GHz and 8GB memory, the simulation time is 3 minutes. Since the computational complexity is only linear with the number of vehicles and can be processed in parallel, the method provided by the invention can be extended to explore the driving strategy of cities containing millions of autonomous cars. For example, for some supercomputers, it takes only a few tens of seconds to compute a million cars.

It should be noted that, although the steps are described in a specific order, the steps are not necessarily executed in the specific order, and in fact, some of the steps may be executed concurrently or even in a changed order as long as the required functions are achieved.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, such as punch cards or in-groove raised structures having instructions stored thereon, and any suitable combination of the foregoing.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An automatic driving vehicle decision emerging method based on collaborative evolution comprises the following steps:

setting a multi-objective optimization problem for the vehicle cluster, and taking a plurality of driving indexes of the controlled vehicles as optimization targets;

setting a plurality of candidate driving strategies for limiting the driving speed of the vehicle and the interaction relation between the front vehicle and the rear vehicle, and dividing the vehicle into different groups according to the candidate driving strategies;

running the vehicle according to the vehicle kinematics model, a preset traffic signal lamp scheduling strategy and a candidate driving strategy corresponding to the vehicle;

exploring the relative merits of the candidate driving strategies based on the optimization objective;

wherein in exploring the goodness of the plurality of candidate driving strategies, vehicle k is evaluated using the following reward function:

wherein:

r (k) represents a reward function for the k-th vehicle;

eta represents the number of accidents caused by the kth vehicle;

T_krepresenting the life cycle of the kth vehicle;

representing the speed of the kth vehicle;

δ_vstandard deviation representing the speed of all vehicles;

β₁，β₂representing the weights of the items;

representing the velocity of the kth vehicle at time t;

Δ t represents a time step;

wherein, in the process of exploring the advantages and disadvantages of the candidate driving strategies, the propagation rules of the descendants are set as follows:

when the vehicle score falls within two standard deviation neighborhoods to the right of the mean or to the right of the region, propagating two offspring;

when the vehicle score falls within the standard deviation neighborhood to the right of the mean, two offspring are bred;

when the vehicle score falls to the left of the mean, no offspring are propagated;

wherein the optimization objective is set to:

such that:

f₁(x)＝∑_i∑_jg_i，j(t)*g_i，j+1(t)

f₂(x)＝-E(v(t))

α_k，t＝α_k，t-1+θ_k，t

v_k，t≤v_max

θ_k，t≤θ_max

wherein:

representing the velocity of the kth vehicle at time t;

v_k，trepresenting the velocity of the kth vehicle at time t;

α_k，trepresenting the posture angle of the kth vehicle at the time t;

θ_k，tindicates the steering wheel angle at the time of the kth vehicle t;

represents the acceleration of the kth vehicle at time t;

Δ t represents a time step;

i | · | | represents a two-norm;

a_maxrepresents the maximum acceleration;

v_maxrepresents the maximum speed;

θ_maxrepresents a maximum steering wheel angle;

g_i，j(t) represents the occupation state of the grid of the ith row and the jth column in the road network at the time t;

c_k，trepresenting the geometric center of the kth vehicle at time t;

f₂(x) -E (v (t)) represents the overall average speed of all vehicles.

2. The collaborative evolution-based autonomous vehicle decision making emergence method according to claim 1, wherein the constructing of the road model and the setting of the initial position distribution and the driving destination of the vehicle in the road model comprises:

constructing a bidirectional four-lane road model with an intersection, wherein the road is composed of grids, each grid is a rectangle, and the length of each grid is set to be equal to the length of a vehicle;

3. The co-evolution-based autonomous vehicle decision making emergence method according to claim 1, characterized in that said plurality of candidate driving strategies comprises:

conservative strategies: the speed-reducing device is used for limiting the vehicle to run at the maximum speed, and when the speed of the current vehicle is lower, the vehicle is decelerated and does not need to overtake;

rational strategy: for limiting the travel at maximum speed, overtaking when the speed of the preceding vehicle is slow and no vehicle has entered the 20 m zone associated with the lane change;

greedy strategy: for limiting travel at maximum speed, always overtaking.

4. The co-evolution-based autonomous vehicle decision making emergence method according to claim 1, characterized in that the traffic signal light scheduling strategy is a round robin of fixed time slices, each time slice being 8 seconds.

5. The co-evolution-based autonomous vehicle decision making launch method according to claim 1, characterized in that said plurality of driving indicators comprises at least two of average transit time, accident rate, average emissions, average energy consumption.

6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

7. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the processor executes the program.