CN113140104A

CN113140104A - Vehicle queue tracking control method and device and computer readable storage medium

Info

Publication number: CN113140104A
Application number: CN202110402251.2A
Authority: CN
Inventors: 褚端峰; 徐峻伟; 吴超仲; 陆丽萍
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-20
Anticipated expiration: 2041-04-14
Also published as: CN113140104B

Abstract

The invention relates to a vehicle queue tracking control method, a device and a computer readable storage medium, wherein the method comprises the following steps: acquiring state error vectors of a self vehicle, a front vehicle and a pilot vehicle, and establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG by utilizing a particle swarm algorithm to obtain a DDPG network with complete training; and acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network which is completely trained, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity. The vehicle queue tracking control method provided by the invention improves the real-time performance and stability of vehicle queue tracking control.

Description

Vehicle queue tracking control method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of vehicle queue control, in particular to a vehicle queue tracking control method, a vehicle queue tracking control device and a computer readable storage medium.

Background

The intelligent networked automobile is an important development direction of the current automobile industry, and has profound influence on the automobile industry and even the transportation industry. With the development of artificial intelligence algorithms and the continuous breakthrough of sensor technologies, unmanned technologies are also continuously and alternately developed, and the development of intelligent networked automobiles is not only an important means for solving the problems of traffic safety, resource consumption, environmental pollution and the like, but also a core element for constructing intelligent travel and establishing an intelligent traffic system.

The occurrence of deep reinforcement learning brings possibility for solving the technical limitation of unmanned driving, the deep reinforcement learning has strong learning capability and good robustness, a vehicle decision control method based on the deep reinforcement learning is introduced on the basis of multi-vehicle cooperative driving, and the real-time performance and the stability of the existing vehicle queue tracking control are poor.

Disclosure of Invention

In view of the above, it is desirable to provide a method and an apparatus for controlling vehicle queue tracking, and a computer-readable storage medium, which are used to solve the problem of poor real-time performance and stability of the existing vehicle queue tracking control.

The invention provides a vehicle queue tracking control method, which comprises the following steps:

acquiring state error vectors of a self vehicle, a front vehicle and a pilot vehicle, and establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle;

training a strategy network and a value network of the DDPG by utilizing a particle swarm algorithm to obtain a DDPG network with complete training;

and acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network which is completely trained, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity.

Further, establishing a DDPG policy network according to the state error vectors of the own vehicle, the preceding vehicle and the pilot vehicle specifically includes: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.

Further, establishing a value network of the DDPG according to the state error vectors of the own vehicle, the preceding vehicle and the pilot vehicle specifically comprises: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.

Further, training the strategy network and the value network of the DDPG by utilizing a particle swarm algorithm specifically comprises the following steps:

determining the population quantity and the particle dimension of the particle swarm algorithm, initializing the position and the speed of particles, updating and iterating each particle of the particle swarm, obtaining the optimal connection weight of the DDPG, and training the strategy network and the value network of the DDPG by utilizing the optimal connection weight.

Further, the updating and iterating each particle of the particle swarm to obtain the optimal connection weight of the DDPG specifically includes: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is

ω＝m+hlog_T(T-t-1)

Wherein m is the convergence value of the inertia weight factor omega, h is greater than 0, T is the maximum iteration frequency, and T is the current iteration frequency.

Further, training the strategy network and the value network of the DDPG by using the optimal connection weight specifically comprises: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.

Further, the reward value of the loss function is R ═ R₁+R₂+R₃+R₄Wherein, in the step (A),

x₂、x₃longitudinal coordinates, L, of the preceding and following vehicles, respectively_safeThe minimum distance, omega, between the front vehicle and the self vehicle when the two vehicles are stationary₁Is the weight of the velocity error, v₂-v₃Is the speed error between the bicycle and the front bicycle, omega₂Is the weight of the variation of the distance error between two vehicles at the time t-1 and the time t, omega₃Is the weight of the error between the distance between the vehicle and the front vehicle at time t, Δ x_tIs the distance, Δ x, between the vehicle and the preceding vehicle at time t_t-1Is the distance, omega, between the vehicle and the preceding vehicle at time t-1₄The acceleration weight of the vehicle, | a | is the controller output of the controlled vehicle.

Further, determining an output control quantity of a decision controller according to the vehicle action value, and determining a throttle opening of the vehicle according to the output control quantity specifically comprises: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is u_throttle＝u_throttle,f+u_throttle,b，u_throttle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, u_throttle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, u_throttle,b＝u_throttle,b2，u_throttle.b2＝k_d(a₁-a₂)+k_p(v₁-v₂)+k_i(x₁-x₂-hv₂-L), for self, u_throttle,b＝u_throttle,b3，

u_throttle.b3＝λ₁(k_d(a₂-a₃)+k_p(v₂-v₃)+k_i(x₂-x₃-hv₃-L_car-L_safe))+λ₂(k_d(a₁-a₃)+k_p(v₁-v₃)+k_i(x₁-x₃-2hv₃-2L_car-2L_safe))

Wherein x is₁、x₂、x₃Respectively the position information of the leading car, the front car and the self car, v₁、v₂、v₃The speeds of a pilot vehicle, a front vehicle and a self vehicle, h is the time interval between vehicles, L is the length of the vehicle, and L is the length of the vehicle_carIs the vehicle length, L_safeThe minimum distance between the front vehicle and the self vehicle is kept when the front vehicle and the self vehicle are static, and the vehicle action value comprises K_p、K_i、K_dSaid K is_p、K_i、K_dRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.

The invention also provides a vehicle queue tracking control device, which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the vehicle queue tracking control method is realized according to any technical scheme.

The invention also provides a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the vehicle queue tracking control method according to any one of the above-mentioned technical solutions.

Compared with the prior art, the invention has the beneficial effects that: establishing a strategy network and a value network of the DDPG according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle by obtaining the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG to obtain a DDPG network with complete training; acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and a DDPG network with complete training, determining the output control quantity of a decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity; the real-time performance and the stability of vehicle queue tracking control are improved.

Drawings

FIG. 1 is a schematic flow chart of a vehicle queue tracking control method provided by the present invention;

FIG. 2 is a schematic diagram of a policy network of the DDPG provided by the present invention;

FIG. 3 is a schematic diagram of the structure of a value network of DDPG provided by the present invention.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

Example 1

The embodiment of the invention provides a vehicle queue tracking control method, a flow schematic diagram of which is shown in figure 1, and the method comprises the following steps:

s1, acquiring state error vectors of the self vehicle, the front vehicle and the pilot vehicle, and establishing a policy network and a value network of DDPG (depth deterministic policy gradient) according to the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; wherein the state error vector comprises the values of distance deviation, speed deviation, acceleration deviation and the like;

s2, training the strategy network and the value network of the DDPG by utilizing a Particle Swarm Optimization (PSO) to obtain a DDPG network with complete training;

and S3, obtaining vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and the DDPG network with complete training, determining the output control quantity of the decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity.

Preferably, the establishing a DDPG policy network according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically includes: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.

In an embodiment, as shown in fig. 2, the policy network of the DDPG includes an input layer, two hidden layers, and an output layer, where the input layer inputs state error vectors of a host vehicle, a front vehicle, and a pilot vehicle, and FC layers 1 and 2 are two full-connected layers, each of which is composed of 150 and 100 neurons, and the two layers are two full-connected layersEach full connection layer adopts ReLU as the activation function of the layer, and the output layer directly outputs K_p、K_i、K_dThree vehicle action values (K)_p、K_i、K_dProportional coefficient, integral coefficient, differential coefficient), using sigmoid function as activation function of output layer, is converted into motion vector a_tThen outputting; because the input state vector comprises the values of distance deviation, speed deviation and acceleration deviation, in order to keep the input of each layer of neural network in the same distribution in the training process of the neural network, thereby improving the generalization capability and the training speed of the network, before the data of each layer of hidden layer is activated by an activation function, a batch normalization layer is introduced, and the input data distribution is converted into normal distribution with 0 as the mean value and 1 as the variance.

Preferably, establishing a DDPG value network according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically includes: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.

In one embodiment, the overall structure of the value network of the DDPG is similar to the policy network, except that the value network adds action input; a schematic diagram of the structure of a value network of a DDPG, as shown in FIG. 3; firstly, based on the state error vector of the self vehicle, the front vehicle and the pilot vehicle as input, after passing through a first full connection layer FC layer1, the state error vector and an action input vector (three vehicle action values) are jointly input into a second full connection layer FC layer2, and after passing through a third full connection layer FC layer3, a Q value (optimal action judgment standard value) is output, a hidden layer of the value network respectively consists of 150, 200 and 100 neurons, the FC layers 1 and 3 both adopt ReLU as an activation function of the layer, and the FC layer2 adopts linear as an activation function; similar to the strategy network, before the data of each hidden layer is activated by the activation function, a batch normalization layer is introduced to convert the input data distribution into normal distribution.

Preferably, the training of the strategy network and the value network of the DDPG by using the particle swarm algorithm specifically comprises the following steps:

In one embodiment, the particle dimensions in the PSO algorithm are specifically expressed as,

H＝h_i+h_i×h_j1+h_j1×h_j2+h_j2×h_k+h_k

wherein H is the particle dimension, H_iIs the number of nodes of the input layer, h_j1Number of nodes of first hidden layer, h_j2The second is the number of nodes of the hidden layer, h_kNumber of output layer nodes; the fitness function error in the PSO algorithm (particle swarm algorithm) is specifically expressed as,

wherein M is the number of samples, N is the particle dimension, y_ijPredicted value of DDPG neural network for sample i, y_ijThe actual value of the DDPG neural network for sample i.

Preferably, the updating and iterating each particle of the particle swarm to obtain the optimal connection weight of the DDPG specifically includes: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is

ω＝m+hlog_T(T-t-1)

In one embodiment, to make the particle jump out of the local poor region, the PSO algorithm has a higher global optimization capability, and the inertial weight factor is improved, the inertial weight factor update formula is,

ω＝m+hlog_T(T-t-1)

wherein m is the convergence value of the inertia weight omega, f + h is more than or equal to 0.1 and less than or equal to 0.9, h is more than 0, and T is the maximum iteration frequency of the PSO algorithm; and t is the current iteration number.

Preferably, the training of the policy network and the value network of the DDPG by using the optimal connection weight specifically includes: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.

In another specific embodiment, the DDPG value network adopts a gradient calculation mode of deterministic strategy gradient based on Q value, and compared with a calculation mode of random strategy gradient, the DDPG value network reduces integral of action and increases derivative of return Q function to action; the gradient of the deterministic strategy gradient based on the Q-value is specifically indicated as,

wherein, mu_θ(s) is a value (Actor) network,

is a policy (Critic) network, N is the number of samples; the policy network may perform a gradient update based on a loss function, which is specifically represented as,

wherein r is_iTo update the prize value for the ith sample, γ is the discount factor, θ^μ'And theta^Q'Representing the weights of the value network and the policy network, respectively.

In one specific embodiment, the DDPG exploration strategy optimization utilizes an Ornstein-Uhlenbeck random process; the exploration related to the time sequence generated by the OU process can improve the exploration efficiency of the control task in the inertial system, and the decision mechanism formula of action is specifically expressed as follows:

a_t＝μ(s_t|θ^μ)+x_t

wherein x is_tIs an OU random process which gradually reduces the effect of noise as training progresses

Preferably, the reward value of the loss function is R ═ R₁+R₂+R₃+R₄Wherein, in the step (A),

R₂＝-ω₁|v₂-v₃|，R₃＝ω₂(|Δx_t-1|-|Δx_t|)-ω₃|Δx_t|，

In one embodiment, the reward function in the formula for gradient update of the loss function of the policy network needs to be designed, and the primary control objective of the vehicle controller is to ensure that the vehicle does not collide, so the reward function is specifically expressed as:

wherein，x₂、x₃Respectively are longitudinal coordinates of a front vehicle and a self vehicle; l is_safeThe minimum distance between the front vehicle and the self vehicle which should be kept when the front vehicle and the self vehicle are static.

Considering that in vehicle formation, the change of the front vehicle speed has a great influence on the controller of the controlled vehicle (i.e. the self vehicle); the reward function is thus specifically expressed as:

R₂＝-ω₁|v₂-v₃|

wherein, ω is₁Is the weight of the velocity error, which is a positive value; v. of₂-v₃Is the speed error between the controlled vehicle and the front vehicle; considering that the distance between the vehicle and the front vehicle at the next moment is closer to the expected distance of the vehicle than the current state; the reward function is thus specifically expressed as:

R₃＝ω₂(|Δx_t-1|-|Δx_t|)-ω₃|Δx_t|

wherein, ω is₂Is the weight of the error variation of the distance between two vehicles at the time t-1 and the time t, and is a positive value, omega₃The weight of the distance error between the two vehicles at the moment t is a positive value; Δ x_t＝x₂-x₃-hv₃-L_car-L_safeIs the distance, Δ x, between the vehicle and the preceding vehicle at time t_t-1＝x₂-x₃-hv₃-L_car-L_safeIs the distance between the self vehicle and the front vehicle at the time of t-1.

The stability of the vehicle formation at high speeds is considered, while the ride comfort is considered. The reward function is thus specifically expressed as:

wherein, ω is₄Is the controlled vehicle acceleration weight, positive value; the | a | is the controller output of the controlled vehicle; the overall reward function is specifically expressed as,

R＝R₁+R₂+R₃+R₄

the comprehensive reward function is the reward value in the loss function.

Preferably, according toThe vehicle action value determination decision controller determines an output control quantity, and determines a throttle opening of a vehicle according to the output control quantity, and specifically comprises the following steps: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is u_throttle＝u_throttle,f+u_throttle,b，u_throttle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, u_throttle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, u_throttle,b＝u_throttle,b2，u_throttle.b2＝k_d(a₁-a₂)+k_p(v₁-v₂)+k_i(x₁-x₂-hv₂-L), for self, u_throttle,b＝u_throttle,b3，

In one embodiment, the controller for controlling the train tracking is formed by combining feedforward control by using a vehicle dynamics inverse model and feedback control by using a PID controller, and the throttle opening of the vehicle can be expressed as

u_throttle＝u_throttle,f+u_throttle,b

The vehicle train tracking control feedforward controller is based on a vehicle dynamics inverse model, a car of a Benz E-class platform in carsim is selected, and the relation between the theoretical rotating speed and the torque of an engine can be expressed as follows:

T_tq(ω)＝2.1774×10^-9ω⁵-2.9646×10^-6ω⁴+1.2×10^-3ω³-4.87×10^-2ω²-42.5986ω+4.9668×10³

the desired output torque of the engine may be expressed as

Wherein, T_eIs the engine desired torque, N · m; i.e. i_g、i_oThe speed changer and the main speed reduction transmission ratio are respectively, tau is a torque characteristic function of the hydraulic torque converter, r is the rolling radius of the wheel, and m; m is the total vehicle mass, kg; c_DF is air resistance coefficient and rolling resistance coefficient respectively; a is the frontal area of the vehicle, i.e. the projected area of the vehicle in the driving direction, m 2; v_xIs the longitudinal running speed of the vehicle, km/h; a is_desIs the desired longitudinal acceleration of the vehicle, m/s 2; eta_TAnd δ are the mechanical efficiency of the drive train and the rotating mass conversion factor of the vehicle, respectively.

The feedforward control amount may be expressed as:

the vehicle queue tracking control feedback controller takes the PID controller as the basis, and considers the deceleration of the pilot vehicle, the speed deviation between the self vehicle and the pilot vehicle and the inter-vehicle distance deviation, the output control quantity of the decision controller of the 2 nd vehicle (front vehicle) can be expressed as,

u_throttle.b2＝k_d(a₁-a₂)+k_p(v₁-v₂)+k_ie₂ ¹＝k_d(a₁-a₂)+k_p(v₁-v₂)+k_i(x₁-x₂-hv₂-L)

wherein x is_i、x_i-1The position information of the ith and the (i-1) th vehicles, m, respectively; v. of_iIs the speed of the ith vehicle, m/s; h is the time interval between workshops, s; l is the vehicle length, m; k_p、K_i、K_dRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.

The output control quantity of the decision controller of the 3 rd vehicle (own vehicle) can be expressed as,

u_throttle.b3＝λ₁(k_d(a₂-a₃)+k_p(v₂-v₃)+k_ie₃ ²)+λ₂(k_d(a₁-a₃)+k_p(v₁-v₃)+k_ie₃ ¹)

＝λ₁(k_d(a₂-a₃)+k_p(v₂-v₃)+k_i(x₂-x₃-hv₃-L_car-L_safe))+λ₂(k_d(a₁-a₃)+k_p(v₁-v₃)+k_i(x₁-x₃-2hv₃-2L_car-2L_safe))

in the formula, v_iIs the speed of the ith vehicle, m/s; h is the time interval between workshops, s; l is_carIs the vehicle length, m; l is_safeThe minimum vehicle distance m to be kept when the two vehicles are static; k_p、K_i、K_dRespectively, a proportionality coefficient, an integral coefficient and a differential coefficient.

Example 2

The embodiment of the invention provides a vehicle queue tracking control device, which comprises a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the vehicle queue tracking control method is realized as in embodiment 1.

Example 3

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the computer program implements the vehicle queue tracking control method according to embodiment 1.

The invention discloses a vehicle queue tracking control method, a device and a computer readable storage medium, wherein a policy network and a value network of DDPG (distributed data group graph) are established according to state error vectors of a self vehicle, a front vehicle and a pilot vehicle by obtaining the state error vectors of the self vehicle, the front vehicle and the pilot vehicle; training a strategy network and a value network of the DDPG to obtain a DDPG network with complete training; acquiring vehicle action values according to the state error vectors of the own vehicle, the front vehicle and the pilot vehicle and a DDPG network with complete training, determining the output control quantity of a decision controller according to the vehicle action values, and determining the throttle opening of the vehicle according to the output control quantity; the real-time performance and the stability of vehicle queue tracking control are improved.

According to the technical scheme, aiming at the form that vehicles run in a queue, the DDPG network is comprehensively considered to be easy to fall into a local optimal solution in the training process, a PSO algorithm is used for generating a large amount of experience containing recessive time distribution attributes based on the characteristic of population search, and then the PSO algorithm is guided to the search direction by individuals trained by a DRL method using gradient updating, so that the PSO algorithm can process the sparse return reinforced learning problem and can be converged more quickly; the inertia weight factor is improved, so that particles can jump out of a local poor region, and the PSO algorithm has high global optimization capability; the training method of the DDPG network in the technical scheme of the invention has the advantages of comprehensive consideration, simple and convenient calculation, high operation speed, high reliability and the like.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A vehicle queue tracking control method is characterized by comprising the following steps:

2. The vehicle queue tracking control method according to claim 1, wherein establishing a policy network of DDPG according to the state error vectors of the own vehicle, the preceding vehicle, and the pilot vehicle specifically comprises: and inputting state error vectors of a self vehicle, a front vehicle and a pilot vehicle at an input layer of the DDPG strategy network, wherein the DDPG strategy network comprises a plurality of full-connection layers, and an output layer of the DDPG strategy network outputs vehicle action values.

3. The method for controlling train tracking according to claim 2, wherein establishing a value network of DDPG according to the state error vectors of the own train, the preceding train, and the pilot train specifically comprises: and inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle at the input layer of the DDPG value network, after the state error vectors pass through the first full-connection layer, inputting the state error vectors of the self vehicle, the front vehicle and the pilot vehicle and the vehicle action value to a second full-connection layer together, and outputting an execution optimal action evaluation standard value by the output layer of the DDPG value network.

4. The vehicle queue tracking control method according to claim 1, wherein training the strategy network and the value network of the DDPG by using a particle swarm algorithm specifically comprises:

5. The vehicle queue tracking control method according to claim 4, wherein the updating iteration is performed on each particle of the particle swarm to obtain the optimal connection weight of the DDPG, and specifically comprises: improving an inertia weight factor, updating and iterating each particle of the particle swarm according to the improved inertia factor to obtain the optimal connection weight of the DDPG, wherein the improved inertia factor is

ω＝m+h log_T(T-t-1)

6. The vehicle queue tracking control method according to claim 4, wherein the training of the strategy network and the value network of the DDPG using the optimal connection weights specifically comprises: and the value network performs gradient updating by adopting a deterministic strategy gradient based on the execution of the optimal action evaluation standard value, performs gradient updating by the strategy network according to the loss function, and trains the strategy network and the value network of the DDPG by utilizing the optimal connection weight.

7. The vehicle queue tracking control method according to claim 6, wherein a reward value of the loss function is R-R₁+R₂+R₃+R₄Wherein, in the step (A),

R₂＝-ω₁|v₂-v₃|，R₃＝ω₂(|Δx_t-1|-|Δx_t|)-ω₃|Δx_t|，

8. The vehicle queue following control method according to claim 1, wherein determining an output control amount of a decision controller based on the vehicle motion value, and determining a throttle opening of a vehicle based on the output control amount, specifically comprises: determining the output control quantity of a decision-making controller according to the vehicle action value, and determining the throttle opening of the vehicle according to the output control quantity and a vehicle throttle opening formula, wherein the vehicle throttle opening formula is u_throttle＝u_throttle,f+u_throttle,b，u_throttle,fIs the ratio of the desired torque to the theoretical torque of the vehicle engine, u_throttle,bTo determine the output control quantity of the decision controller, for the preceding vehicle, u_throttle,b＝u_throttle,b2，u_throttle.b2＝k_d(a₁-a₂)+k_p(v₁-v₂)+k_i(x₁-x₂-hv₂-L), for self, u_throttle,b＝u_throttle,b3，

9. A vehicle queue tracking control apparatus comprising a processor and a memory, the memory having stored thereon a computer program that, when executed by the processor, implements the vehicle queue tracking control method according to any one of claims 1 to 8.

10. A computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the vehicle queue tracking control method according to any one of claims 1 to 8.