CN113140104A - Vehicle queue tracking control method and device and computer readable storage medium - Google Patents

Vehicle queue tracking control method and device and computer readable storage medium Download PDF

Info

Publication number
CN113140104A
CN113140104A CN202110402251.2A CN202110402251A CN113140104A CN 113140104 A CN113140104 A CN 113140104A CN 202110402251 A CN202110402251 A CN 202110402251A CN 113140104 A CN113140104 A CN 113140104A
Authority
CN
China
Prior art keywords
vehicle
ddpg
network
value
throttle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110402251.2A
Other languages
Chinese (zh)
Other versions
CN113140104B (en
Inventor
褚端峰
徐峻伟
吴超仲
陆丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110402251.2A priority Critical patent/CN113140104B/en
Publication of CN113140104A publication Critical patent/CN113140104A/en
Application granted granted Critical
Publication of CN113140104B publication Critical patent/CN113140104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/22Platooning, i.e. convoy of communicating vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Abstract

The invention relates to a vehicle queue tracking control method, a device and a computer readable storage medium, wherein the method comprises the following steps: acquiring state error vectors of a self vehicle, a front vehicle and a pilot vehicle, and establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG by utilizing a particle swarm algorithm to obtain a DDPG network with complete training; and acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network which is completely trained, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity. The vehicle queue tracking control method provided by the invention improves the real-time performance and stability of vehicle queue tracking control.

Description

Vehicle queue tracking control method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of vehicle queue control, in particular to a vehicle queue tracking control method, a vehicle queue tracking control device and a computer readable storage medium.
Background
The intelligent networked automobile is an important development direction of the current automobile industry, and has profound influence on the automobile industry and even the transportation industry. With the development of artificial intelligence algorithms and the continuous breakthrough of sensor technologies, unmanned technologies are also continuously and alternately developed, and the development of intelligent networked automobiles is not only an important means for solving the problems of traffic safety, resource consumption, environmental pollution and the like, but also a core element for constructing intelligent travel and establishing an intelligent traffic system.
The occurrence of deep reinforcement learning brings possibility for solving the technical limitation of unmanned driving, the deep reinforcement learning has strong learning capability and good robustness, a vehicle decision control method based on the deep reinforcement learning is introduced on the basis of multi-vehicle cooperative driving, and the real-time performance and the stability of the existing vehicle queue tracking control are poor.
Disclosure of Invention
In view of the above, it is desirable to provide a method and an apparatus for controlling vehicle queue tracking, and a computer-readable storage medium, which are used to solve the problem of poor real-time performance and stability of the existing vehicle queue tracking control.
The invention provides a vehicle queue tracking control method, which comprises the following steps:
acquiring state error vectors of a self vehicle, a front vehicle and a pilot vehicle, and establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle;
training a strategy network and a value network of the DDPG by utilizing a particle swarm algorithm to obtain a DDPG network with complete training;
and acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network which is completely trained, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity.
Further, establishing a DDPG policy network according to the state error vectors of the own vehicle, the preceding vehicle and the pilot vehicle specifically includes: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.
Further, establishing a value network of the DDPG according to the state error vectors of the own vehicle, the preceding vehicle and the pilot vehicle specifically comprises: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.
Further, training the strategy network and the value network of the DDPG by utilizing a particle swarm algorithm specifically comprises the following steps:
determining the population quantity and the particle dimension of the particle swarm algorithm, initializing the position and the speed of particles, updating and iterating each particle of the particle swarm, obtaining the optimal connection weight of the DDPG, and training the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
Further, the updating and iterating each particle of the particle swarm to obtain the optimal connection weight of the DDPG specifically includes: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is
ω=m+hlogT(T-t-1)
Wherein m is the convergence value of the inertia weight factor omega, h is greater than 0, T is the maximum iteration frequency, and T is the current iteration frequency.
Further, training the strategy network and the value network of the DDPG by using the optimal connection weight specifically comprises: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
Further, the reward value of the loss function is R ═ R1+R2+R3+R4Wherein, in the step (A),
Figure BDA0003020755130000031
Figure BDA0003020755130000032
x2、x3longitudinal coordinates, L, of the preceding and following vehicles, respectivelysafeThe minimum distance, omega, between the front vehicle and the self vehicle when the two vehicles are stationary1Is the weight of the velocity error, v2-v3Is the speed error between the bicycle and the front bicycle, omega2Is the weight of the variation of the distance error between two vehicles at the time t-1 and the time t, omega3Is the weight of the error between the distance between the vehicle and the front vehicle at time t, Δ xtIs the distance, Δ x, between the vehicle and the preceding vehicle at time tt-1Is the distance, omega, between the vehicle and the preceding vehicle at time t-14The acceleration weight of the vehicle, | a | is the controller output of the controlled vehicle.
Further, determining an output control quantity of a decision controller according to the vehicle action value, and determining a throttle opening of the vehicle according to the output control quantity specifically comprises: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is uthrottle=uthrottle,f+uthrottle,b,uthrottle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, uthrottle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, uthrottle,b=uthrottle,b2,uthrottle.b2=kd(a1-a2)+kp(v1-v2)+ki(x1-x2-hv2-L), for self, uthrottle,b=uthrottle,b3
uthrottle.b3=λ1(kd(a2-a3)+kp(v2-v3)+ki(x2-x3-hv3-Lcar-Lsafe))+λ2(kd(a1-a3)+kp(v1-v3)+ki(x1-x3-2hv3-2Lcar-2Lsafe))
Wherein x is1、x2、x3Respectively the position information of the leading car, the front car and the self car, v1、v2、v3The speeds of a pilot vehicle, a front vehicle and a self vehicle, h is the time interval between vehicles, L is the length of the vehicle, and L is the length of the vehiclecarIs the vehicle length, LsafeThe minimum distance between the front vehicle and the self vehicle is kept when the front vehicle and the self vehicle are static, and the vehicle action value comprises Kp、Ki、KdSaid K isp、Ki、KdRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.
The invention also provides a vehicle queue tracking control device, which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the vehicle queue tracking control method is realized according to any technical scheme.
The invention also provides a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the vehicle queue tracking control method according to any one of the above-mentioned technical solutions.
Compared with the prior art, the invention has the beneficial effects that: establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle by obtaining the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG to obtain a DDPG network with complete training; acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and a DDPG network with complete training, determining the output control quantity of a decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity; the real-time performance and the stability of vehicle queue tracking control are improved.
Drawings
FIG. 1 is a schematic flow chart of a vehicle queue tracking control method provided by the present invention;
FIG. 2 is a schematic diagram of a policy network of the DDPG provided by the present invention;
FIG. 3 is a schematic diagram of the structure of a value network of DDPG provided by the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Example 1
The embodiment of the invention provides a vehicle queue tracking control method, a flow schematic diagram of which is shown in figure 1, and the method comprises the following steps:
s1, acquiring state error vectors of the self vehicle, the front vehicle and the pilot vehicle, and establishing a policy network and a value network of DDPG (depth deterministic policy gradient) according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; wherein the state error vector comprises the values of distance deviation, speed deviation, acceleration deviation and the like;
s2, training the strategy network and the value network of the DDPG by utilizing a Particle Swarm Optimization (PSO) to obtain a DDPG network with complete training;
and S3, obtaining vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network with complete training, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity.
Preferably, the establishing a DDPG policy network according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically includes: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.
In an embodiment, as shown in fig. 2, the policy network of the DDPG includes an input layer, two hidden layers, and an output layer, where the input layer inputs state error vectors of a host vehicle, a front vehicle, and a pilot vehicle, and FC layers 1 and 2 are two full-connected layers, each of which is composed of 150 and 100 neurons, and the two layers are two full-connected layersEach full connection layer adopts ReLU as the activation function of the layer, and the output layer directly outputs Kp、Ki、KdThree vehicle action values (K)p、Ki、KdProportional coefficient, integral coefficient, differential coefficient), using sigmoid function as activation function of output layer, is converted into motion vector atThen outputting; because the input state vector comprises the values of distance deviation, speed deviation and acceleration deviation, in order to keep the input of each layer of neural network in the same distribution in the training process of the neural network, thereby improving the generalization capability and the training speed of the network, before the data of each layer of hidden layer is activated by an activation function, a batch normalization layer is introduced, and the input data distribution is converted into normal distribution with 0 as the mean value and 1 as the variance.
Preferably, establishing a DDPG value network according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically includes: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.
In one embodiment, the overall structure of the value network of the DDPG is similar to the policy network, except that the value network adds action input; a schematic diagram of the structure of a value network of a DDPG, as shown in FIG. 3; firstly, based on the state error vector of the self vehicle, the front vehicle and the pilot vehicle as input, after passing through a first full connection layer FC layer1, the state error vector and an action input vector (three vehicle action values) are jointly input into a second full connection layer FC layer2, and after passing through a third full connection layer FC layer3, a Q value (optimal action judgment standard value) is output, a hidden layer of the value network respectively consists of 150, 200 and 100 neurons, the FC layers 1 and 3 both adopt ReLU as an activation function of the layer, and the FC layer2 adopts linear as an activation function; similar to the strategy network, before the data of each hidden layer is activated by the activation function, a batch normalization layer is introduced to convert the input data distribution into normal distribution.
Preferably, the training of the strategy network and the value network of the DDPG by using the particle swarm algorithm specifically comprises the following steps:
determining the population quantity and the particle dimension of the particle swarm algorithm, initializing the position and the speed of particles, updating and iterating each particle of the particle swarm, obtaining the optimal connection weight of the DDPG, and training the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
In one embodiment, the particle dimensions in the PSO algorithm are specifically expressed as,
H=hi+hi×hj1+hj1×hj2+hj2×hk+hk
wherein H is the particle dimension, HiIs the number of nodes of the input layer, hj1Number of nodes of first hidden layer, hj2The second is the number of nodes of the hidden layer, hkNumber of output layer nodes; the fitness function error in the PSO algorithm (particle swarm algorithm) is specifically expressed as,
Figure BDA0003020755130000071
wherein M is the number of samples, N is the particle dimension, yijPredicted value of DDPG neural network for sample i, yijThe actual value of the DDPG neural network for sample i.
Preferably, the updating and iterating each particle of the particle swarm to obtain the optimal connection weight of the DDPG specifically includes: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is
ω=m+hlogT(T-t-1)
Wherein m is the convergence value of the inertia weight factor omega, h is greater than 0, T is the maximum iteration frequency, and T is the current iteration frequency.
In one embodiment, to make the particle jump out of the local poor region, the PSO algorithm has a higher global optimization capability, and the inertial weight factor is improved, the inertial weight factor update formula is,
ω=m+hlogT(T-t-1)
wherein m is the convergence value of the inertia weight omega, f + h is more than or equal to 0.1 and less than or equal to 0.9, h is more than 0, and T is the maximum iteration frequency of the PSO algorithm; and t is the current iteration number.
Preferably, the training of the policy network and the value network of the DDPG by using the optimal connection weight specifically includes: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
In another specific embodiment, the DDPG value network adopts a gradient calculation mode of deterministic strategy gradient based on Q value, and compared with a calculation mode of random strategy gradient, the DDPG value network reduces integral of action and increases derivative of return Q function to action; the gradient of the deterministic strategy gradient based on the Q-value is specifically indicated as,
Figure BDA0003020755130000081
wherein, muθ(s) is a value (Actor) network,
Figure BDA0003020755130000082
is a policy (Critic) network, N is the number of samples; the policy network may perform a gradient update based on a loss function, which is specifically represented as,
Figure BDA0003020755130000083
wherein r isiTo update the prize value for the ith sample, γ is the discount factor, θμ'And thetaQ'Representing the weights of the value network and the policy network, respectively.
In one specific embodiment, the DDPG exploration strategy optimization utilizes an Ornstein-Uhlenbeck random process; the exploration related to the time sequence generated by the OU process can improve the exploration efficiency of the control task in the inertial system, and the decision mechanism formula of action is specifically expressed as follows:
at=μ(stμ)+xt
wherein x istIs an OU random process which gradually reduces the effect of noise as training progresses
Preferably, the reward value of the loss function is R ═ R1+R2+R3+R4Wherein, in the step (A),
Figure BDA0003020755130000084
R2=-ω1|v2-v3|,R3=ω2(|Δxt-1|-|Δxt|)-ω3|Δxt|,
Figure BDA0003020755130000085
x2、x3longitudinal coordinates, L, of the preceding and following vehicles, respectivelysafeThe minimum distance, omega, between the front vehicle and the self vehicle when the two vehicles are stationary1Is the weight of the velocity error, v2-v3Is the speed error between the bicycle and the front bicycle, omega2Is the weight of the variation of the distance error between two vehicles at the time t-1 and the time t, omega3Is the weight of the error between the distance between the vehicle and the front vehicle at time t, Δ xtIs the distance, Δ x, between the vehicle and the preceding vehicle at time tt-1Is the distance, omega, between the vehicle and the preceding vehicle at time t-14The acceleration weight of the vehicle, | a | is the controller output of the controlled vehicle.
In one embodiment, the reward function in the formula for gradient update of the loss function of the policy network needs to be designed, and the primary control objective of the vehicle controller is to ensure that the vehicle does not collide, so the reward function is specifically expressed as:
Figure BDA0003020755130000091
wherein,x2、x3Respectively are longitudinal coordinates of a front vehicle and a self vehicle; l issafeThe minimum distance between the front vehicle and the self vehicle which should be kept when the front vehicle and the self vehicle are static.
Considering that in vehicle formation, the change of the front vehicle speed has a great influence on the controller of the controlled vehicle (i.e. the self vehicle); the reward function is thus specifically expressed as:
R2=-ω1|v2-v3|
wherein, ω is1Is the weight of the velocity error, which is a positive value; v. of2-v3Is the speed error between the controlled vehicle and the front vehicle; considering that the distance between the vehicle and the front vehicle at the next moment is closer to the expected distance of the vehicle than the current state; the reward function is thus specifically expressed as:
R3=ω2(|Δxt-1|-|Δxt|)-ω3|Δxt|
wherein, ω is2Is the weight of the error variation of the distance between two vehicles at the time t-1 and the time t, and is a positive value, omega3The weight of the distance error between the two vehicles at the moment t is a positive value; Δ xt=x2-x3-hv3-Lcar-LsafeIs the distance, Δ x, between the vehicle and the preceding vehicle at time tt-1=x2-x3-hv3-Lcar-LsafeIs the distance between the self vehicle and the front vehicle at the time of t-1.
The stability of the vehicle formation at high speeds is considered, while the ride comfort is considered. The reward function is thus specifically expressed as:
Figure BDA0003020755130000092
wherein, ω is4Is the controlled vehicle acceleration weight, positive value; the | a | is the controller output of the controlled vehicle; the overall reward function is specifically expressed as,
R=R1+R2+R3+R4
the comprehensive reward function is the reward value in the loss function.
Preferably, according toThe vehicle action value determination decision controller determines an output control quantity, and determines a throttle opening of a vehicle according to the output control quantity, and specifically comprises the following steps: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is uthrottle=uthrottle,f+uthrottle,b,uthrottle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, uthrottle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, uthrottle,b=uthrottle,b2,uthrottle.b2=kd(a1-a2)+kp(v1-v2)+ki(x1-x2-hv2-L), for self, uthrottle,b=uthrottle,b3
uthrottle.b3=λ1(kd(a2-a3)+kp(v2-v3)+ki(x2-x3-hv3-Lcar-Lsafe))+λ2(kd(a1-a3)+kp(v1-v3)+ki(x1-x3-2hv3-2Lcar-2Lsafe))
Wherein x is1、x2、x3Respectively the position information of the leading car, the front car and the self car, v1、v2、v3The speeds of a pilot vehicle, a front vehicle and a self vehicle, h is the time interval between vehicles, L is the length of the vehicle, and L is the length of the vehiclecarIs the vehicle length, LsafeThe minimum distance between the front vehicle and the self vehicle is kept when the front vehicle and the self vehicle are static, and the vehicle action value comprises Kp、Ki、KdSaid K isp、Ki、KdRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.
In one embodiment, the controller for controlling the train tracking is formed by combining feedforward control by using a vehicle dynamics inverse model and feedback control by using a PID controller, and the throttle opening of the vehicle can be expressed as
uthrottle=uthrottle,f+uthrottle,b
The vehicle train tracking control feedforward controller is based on a vehicle dynamics inverse model, a car of a Benz E-class platform in carsim is selected, and the relation between the theoretical rotating speed and the torque of an engine can be expressed as follows:
Ttq(ω)=2.1774×10-9ω5-2.9646×10-6ω4+1.2×10-3ω3-4.87×10-2ω2-42.5986ω+4.9668×103
the desired output torque of the engine may be expressed as
Figure BDA0003020755130000111
Wherein, TeIs the engine desired torque, N · m; i.e. ig、ioThe speed changer and the main speed reduction transmission ratio are respectively, tau is a torque characteristic function of the hydraulic torque converter, r is the rolling radius of the wheel, and m; m is the total vehicle mass, kg; cDF is air resistance coefficient and rolling resistance coefficient respectively; a is the frontal area of the vehicle, i.e. the projected area of the vehicle in the driving direction, m 2; vxIs the longitudinal running speed of the vehicle, km/h; a isdesIs the desired longitudinal acceleration of the vehicle, m/s 2; etaTAnd δ are the mechanical efficiency of the drive train and the rotating mass conversion factor of the vehicle, respectively.
The feedforward control amount may be expressed as:
Figure BDA0003020755130000112
the vehicle queue tracking control feedback controller takes the PID controller as the basis, and considers the deceleration of the pilot vehicle, the speed deviation between the self vehicle and the pilot vehicle and the inter-vehicle distance deviation, the output control quantity of the decision controller of the 2 nd vehicle (front vehicle) can be expressed as,
uthrottle.b2=kd(a1-a2)+kp(v1-v2)+kie2 1=kd(a1-a2)+kp(v1-v2)+ki(x1-x2-hv2-L)
wherein x isi、xi-1The position information of the ith and the (i-1) th vehicles, m, respectively; v. ofiIs the speed of the ith vehicle, m/s; h is the time interval between workshops, s; l is the vehicle length, m; kp、Ki、KdRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.
The output control quantity of the decision controller of the 3 rd vehicle (own vehicle) can be expressed as,
uthrottle.b3=λ1(kd(a2-a3)+kp(v2-v3)+kie3 2)+λ2(kd(a1-a3)+kp(v1-v3)+kie3 1)
=λ1(kd(a2-a3)+kp(v2-v3)+ki(x2-x3-hv3-Lcar-Lsafe))+λ2(kd(a1-a3)+kp(v1-v3)+ki(x1-x3-2hv3-2Lcar-2Lsafe))
in the formula, viIs the speed of the ith vehicle, m/s; h is the time interval between workshops, s; l iscarIs the vehicle length, m; l issafeThe minimum vehicle distance m to be kept when the two vehicles are static; kp、Ki、KdRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.
Example 2
The embodiment of the invention provides a vehicle queue tracking control device, which comprises a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the vehicle queue tracking control method is realized as in embodiment 1.
Example 3
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the computer program implements the vehicle queue tracking control method according to embodiment 1.
The invention discloses a vehicle queue tracking control method, a device and a computer readable storage medium, wherein a policy network and a value network of DDPG (distributed data group graph) are established according to state error vectors of a self vehicle, a front vehicle and a pilot vehicle by obtaining the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG to obtain a DDPG network with complete training; acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and a DDPG network with complete training, determining the output control quantity of a decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity; the real-time performance and the stability of vehicle queue tracking control are improved.
According to the technical scheme, aiming at the form that vehicles run in a queue, the DDPG network is comprehensively considered to be easy to fall into a local optimal solution in the training process, a PSO algorithm is used for generating a large amount of experience containing recessive time distribution attributes based on the characteristic of population search, and then the PSO algorithm is guided to the search direction by individuals trained by a DRL method using gradient updating, so that the PSO algorithm can process the sparse return reinforced learning problem and can be converged more quickly; the inertia weight factor is improved, so that particles can jump out of a local poor region, and the PSO algorithm has high global optimization capability; the training method of the DDPG network in the technical scheme of the invention has the advantages of comprehensive consideration, simple and convenient calculation, high operation speed, high reliability and the like.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A vehicle queue tracking control method is characterized by comprising the following steps:
acquiring state error vectors of a self vehicle, a front vehicle and a pilot vehicle, and establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle;
training a strategy network and a value network of the DDPG by utilizing a particle swarm algorithm to obtain a DDPG network with complete training;
and acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network which is completely trained, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity.
2. The vehicle queue tracking control method according to claim 1, wherein establishing a policy network of DDPG according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically comprises: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.
3. The method for controlling train tracking according to claim 2, wherein establishing a value network of DDPG according to the state error vectors of the own train, the preceding train, and the pilot train specifically comprises: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.
4. The vehicle queue tracking control method according to claim 1, wherein training the strategy network and the value network of the DDPG by using a particle swarm algorithm specifically comprises:
determining the population quantity and the particle dimension of the particle swarm algorithm, initializing the position and the speed of particles, updating and iterating each particle of the particle swarm, obtaining the optimal connection weight of the DDPG, and training the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
5. The vehicle queue tracking control method according to claim 4, wherein the updating iteration is performed on each particle of the particle swarm to obtain the optimal connection weight of the DDPG, and specifically comprises: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is
ω=m+h logT(T-t-1)
Wherein m is the convergence value of the inertia weight factor omega, h is greater than 0, T is the maximum iteration frequency, and T is the current iteration frequency.
6. The vehicle queue tracking control method according to claim 4, wherein the training of the strategy network and the value network of the DDPG using the optimal connection weights specifically comprises: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.
7. The vehicle queue tracking control method according to claim 6, wherein a reward value of the loss function is R-R1+R2+R3+R4Wherein, in the step (A),
Figure FDA0003020755120000021
R2=-ω1|v2-v3|,R3=ω2(|Δxt-1|-|Δxt|)-ω3|Δxt|,
Figure FDA0003020755120000022
x2、x3longitudinal coordinates, L, of the preceding and following vehicles, respectivelysafeThe minimum distance, omega, between the front vehicle and the self vehicle when the two vehicles are stationary1Is the weight of the velocity error, v2-v3Is the speed error between the bicycle and the front bicycle, omega2Is the weight of the variation of the distance error between two vehicles at the time t-1 and the time t, omega3Is the weight of the error between the distance between the vehicle and the front vehicle at time t, Δ xtIs the distance, Δ x, between the vehicle and the preceding vehicle at time tt-1Is the distance, omega, between the vehicle and the preceding vehicle at time t-14The acceleration weight of the vehicle, | a | is the controller output of the controlled vehicle.
8. The vehicle queue following control method according to claim 1, wherein determining an output control amount of a decision controller based on the vehicle motion value, and determining a throttle opening of a vehicle based on the output control amount, specifically comprises: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is uthrottle=uthrottle,f+uthrottle,b,uthrottle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, uthrottle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, uthrottle,b=uthrottle,b2,uthrottle.b2=kd(a1-a2)+kp(v1-v2)+ki(x1-x2-hv2-L), for self, uthrottle,b=uthrottle,b3
uthrottle.b3=λ1(kd(a2-a3)+kp(v2-v3)+ki(x2-x3-hv3-Lcar-Lsafe))+λ2(kd(a1-a3)+kp(v1-v3)+ki(x1-x3-2hv3-2Lcar-2Lsafe))
Wherein x is1、x2、x3Respectively the position information of the leading car, the front car and the self car, v1、v2、v3The speeds of a pilot vehicle, a front vehicle and a self vehicle, h is the time interval between vehicles, L is the length of the vehicle, and L is the length of the vehiclecarIs the vehicle length, LsafeThe minimum distance between the front vehicle and the self vehicle is kept when the front vehicle and the self vehicle are static, and the vehicle action value comprises Kp、Ki、KdSaid K isp、Ki、KdRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.
9. A vehicle queue tracking control apparatus comprising a processor and a memory, the memory having stored thereon a computer program that, when executed by the processor, implements the vehicle queue tracking control method according to any one of claims 1 to 8.
10. A computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the vehicle queue tracking control method according to any one of claims 1 to 8.
CN202110402251.2A 2021-04-14 2021-04-14 Vehicle queue tracking control method and device and computer readable storage medium Active CN113140104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110402251.2A CN113140104B (en) 2021-04-14 2021-04-14 Vehicle queue tracking control method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110402251.2A CN113140104B (en) 2021-04-14 2021-04-14 Vehicle queue tracking control method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113140104A true CN113140104A (en) 2021-07-20
CN113140104B CN113140104B (en) 2022-06-21

Family

ID=76812585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110402251.2A Active CN113140104B (en) 2021-04-14 2021-04-14 Vehicle queue tracking control method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113140104B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114089633A (en) * 2021-11-19 2022-02-25 江苏科技大学 Multi-motor coupling drive control device and method for underwater robot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6458912B1 (en) * 2018-01-24 2019-01-30 三菱電機株式会社 Position control device and position control method
CN110032189A (en) * 2019-04-22 2019-07-19 河海大学常州校区 A kind of intelligent storage method for planning path for mobile robot not depending on map
CN110329257A (en) * 2019-06-24 2019-10-15 武汉理工大学 A kind of more longitudinally controlled methods of car team team based on Che-Che Tongxin
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110989577A (en) * 2019-11-15 2020-04-10 深圳先进技术研究院 Automatic driving decision method and automatic driving device of vehicle
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6458912B1 (en) * 2018-01-24 2019-01-30 三菱電機株式会社 Position control device and position control method
CN110032189A (en) * 2019-04-22 2019-07-19 河海大学常州校区 A kind of intelligent storage method for planning path for mobile robot not depending on map
CN110329257A (en) * 2019-06-24 2019-10-15 武汉理工大学 A kind of more longitudinally controlled methods of car team team based on Che-Che Tongxin
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110989577A (en) * 2019-11-15 2020-04-10 深圳先进技术研究院 Automatic driving decision method and automatic driving device of vehicle
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
AHMAD PARVARESH 等: "A Novel Deep Learning Backstepping Controller-Based Digital Twins Technology for Pitch Angle Control of Variable Speed Wind Turbine", 《MDPI》 *
QIHAO LIU1 等: "PNS: Population-Guided Novelty Search for Reinforcement Learning in Hard Exploration Environments", 《PNS》 *
徐杨 等: "无人车辆轨迹规划与跟踪控制的统一建模方法", 《自动化学报》 *
罗颖 等: "基于改进DDPG算法的车辆低速跟驰行为决策研究", 《测控技术》 *
鲁华祥 等: "基于深度确定性策略梯度的粒子群算法", 《电子科技大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114089633A (en) * 2021-11-19 2022-02-25 江苏科技大学 Multi-motor coupling drive control device and method for underwater robot
CN114089633B (en) * 2021-11-19 2024-04-26 江苏科技大学 Multi-motor coupling driving control device and method for underwater robot

Also Published As

Publication number Publication date
CN113140104B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN112193280B (en) Heavy-load train reinforcement learning control method and system
Xu et al. Look-ahead prediction-based real-time optimal energy management for connected HEVs
CN110992695B (en) Vehicle urban intersection traffic decision multi-objective optimization method based on conflict resolution
CN106740846A (en) A kind of electric automobile self-adapting cruise control method of double mode switching
CN103324085A (en) Optimal control method based on supervised reinforcement learning
Ding et al. Driving strategy of connected and autonomous vehicles based on multiple preceding vehicles state estimation in mixed vehicular traffic
CN114312830A (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
CN113140104B (en) Vehicle queue tracking control method and device and computer readable storage medium
Wang et al. Lane keeping assist for an autonomous vehicle based on deep reinforcement learning
Zhao et al. Supervised adaptive dynamic programming based adaptive cruise control
Selvaraj et al. An ML-aided reinforcement learning approach for challenging vehicle maneuvers
Dubey et al. Autonomous braking and throttle system: A deep reinforcement learning approach for naturalistic driving
Debarshi et al. Robust EMRAN-aided coupled controller for autonomous vehicles
Lin et al. Adaptive prediction-based control for an ecological cruise control system on curved and hilly roads
Kerbel et al. Driver assistance eco-driving and transmission control with deep reinforcement learning
Guo et al. Modeling, learning and prediction of longitudinal behaviors of human-driven vehicles by incorporating internal human DecisionMaking process using inverse model predictive control
Fehér et al. Proving ground test of a ddpg-based vehicle trajectory planner
Hailemichael et al. Safe reinforcement learning for an energy-efficient driver assistance system
CN114997048A (en) Automatic driving vehicle lane keeping method based on TD3 algorithm improved by exploration strategy
Chen et al. Decision making for overtaking of unmanned vehicle based on deep Q-learning
CN113759701A (en) High-speed train speed control method and system
CN114228690A (en) Automatic driving vehicle roll control method based on DDPG and iterative control
CN111857112B (en) Automobile local path planning method and electronic equipment
Sun Cooperative adaptive cruise control performance analysis
Zhuang et al. Model-Predictive-Control-Based Simultaneous Trajectory Tracking and Speed Control for Intelligent Vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant