CN111679577B

CN111679577B - Speed tracking control method and automatic driving control system of high-speed train

Info

Publication number: CN111679577B
Application number: CN202010461495.3A
Authority: CN
Inventors: 董海荣; 高士根; 王佳成; 郑玥; 李浥东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-11-05
Anticipated expiration: 2040-05-27
Also published as: CN111679577A

Abstract

The invention provides a speed tracking control method and an automatic driving control system of a high-speed train, wherein the tracking control method designs an ATO control algorithm which is completely independent of the internal dynamic characteristics of a train control system, an optimal train speed tracking control strategy is solved by analyzing and utilizing train running state data based on an integral reinforcement learning technology, and train speed tracking control is carried out according to the optimal train speed tracking control strategy, so that the problem of control performance reduction caused by uncertainty of train dynamic characteristics is solved, and the train control input is guaranteed to be restricted within a preset value, so that the actuator saturation phenomenon is avoided. The automatic driving control system controls the input to be limited within the preset value under the action of the tracking control method, thereby avoiding the saturation of the actuator, controlling the train to drive according to a given target speed-distance curve and realizing the automatic driving of the high-speed train.

Description

Speed tracking control method and automatic driving control system of high-speed train

Technical Field

The invention relates to the technical field of automatic driving systems of high-speed trains, in particular to a speed tracking control method and an automatic driving control system of a high-speed train.

Background

As a core of an Automatic Train Operation (ATO) system, an ATO control algorithm plays an important role in guaranteeing the safety and reliability of Train Operation.

In the design process of the ATO control algorithm, a train dynamics model is indispensable because it can reflect the dynamic characteristics of the control object. However, the high-speed railway has the characteristics of high speed during operation, long interval operation time and the like, and the dynamic characteristics of the high-speed train are time-varying and uncertain under the influence of a complex train operation environment. The train model parameter difference is caused by different traction braking characteristics, different locomotive design styles and different marshalling modes of different types of motor train units understood from the interior of the train; from external disturbance analysis, the high-speed train model parameters can be influenced by uncertain factors such as increase of operation mileage, different passenger carrying quantity, change of external environment and the like. In the running process of a high-speed train, uncertain train dynamics caused by complex disturbance can cause adverse influence on speed tracking control performance, and in order to solve the problem, learners estimate unknown train model parameters or functions containing the unknown model parameters by using technologies such as adaptive control, neural network and the like, so that the tracking accuracy of a control algorithm is improved. However, due to the lack of calculation accuracy, there are inevitable estimation errors in the estimation process, so that the ideal control effect and control performance cannot be achieved; moreover, the use of these techniques requires complex formula derivation, which results in additional computational effort and computational speed requirements that are not affordable by existing on-board computers.

Disclosure of Invention

The embodiment of the invention provides a speed tracking control method and an automatic driving control system of a high-speed train, which can essentially solve the influence of external environment disturbance on the speed tracking control of the high-speed train and have certain significance on improving the speed tracking accuracy of the high-speed train.

In order to achieve the purpose, the invention adopts the following technical scheme.

A speed tracking control method of a high-speed train comprises the following steps:

s1, obtaining the real-time position p (t) and the ideal position p of the high-speed train_d(t), real-time velocity v (t) and ideal velocity v_d(t)；

S2 comparing the real-time position p (t) with the ideal position p_d(t) differencing to obtain a real-time position error e_p(t); comparing the real-time velocity v (t) with the ideal velocity v_d(t) differencing to obtain a real-time speed error e_v(t)；

S3 basing on real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), obtaining a high-speed train optimization control model;

s4, solving the optimal control model of the high-speed train through an integral reinforcement learning algorithm based on an actor-criticizing family neural network structure to obtain an optimal control strategy of the high-speed train; and carrying out speed tracking control on the high-speed train based on the optimal control strategy of the high-speed train.

Preferably, step S3 specifically includes:

s31 pairs real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t) integrating to obtain a high-speed train composite system state formula X (t) ═ e_p(t),e_v(t),p_d(t),v_d(t)]^T(1) According to train dynamics models

Establishing a high-speed train composite system expression formula

Wherein M is train mass, f_aFor additional running resistance, f_bF (X (t)) is the internal dynamic of the composite system for basic running resistance,

for composite system input dynamics, u_rControl input in the reinforcement learning process;

s32 is based on formulas (1) and (2) to obtain a composite system value function of the high-speed train

In the formula (I), the compound is shown in the specification,

discount factor 0<γ<1, where ρ (v) is the control with velocity v as an independent variableThe input is limited, Q and R are the state weight matrix and the input weight matrix, respectively.

Preferably, the point-based reinforcement learning algorithm based on the actor-criticizing family neural network structure includes an actor neural network and a criticizing family neural network, and the step S4 specifically includes:

s41 setting criticizing family neural network weight vector as W₁Setting actor neural network weight vector to W₂(ii) a Setting critics 'and actor's neural networks as basis functions

The basis function

The first derivative of the state X of the composite system of the high-speed train is

S42 the criticizing family neural network weight is updated through a first adaptive law, and the first adaptive law comprises

And

in the formula, alpha₁>0 is the adaptive estimated rate coefficient;

s43 updating the actor neural network weights through a second adaptive law comprising

And

in the formula, alpha₂>0 is the adaptive estimated rate coefficient, Y>0 is a control constant;

s44 high-speed train control input quantity of integral reinforcement learning is obtained based on the updating of criticizing family neural network weight and actor neural network weight

Applying the control input quantity of the high-speed train of the integral reinforcement learning to a control system of the high-speed train to obtain train running state data at the T + T moment; where ρ is a simplified form of ρ (v);

s45 executes steps S2, S31, S42, S43 and S44 multiple times, obtaining actor neural network weight data sets;

s46 analyzing the actor neural network weight data set to obtain the optimal weight vector of the actor neural network

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train

Preferably, the method comprises the following steps:

the train ideal position acquisition module is used for acquiring the ideal position p of the high-speed train in real time_d(t)；

The train ideal speed acquisition module is used for acquiring the ideal speed v of the high-speed train in real time_d(t)；

The train positioning module is used for acquiring a real-time position p (t) of the high-speed train;

the train speed measuring module is used for acquiring the real-time speed v (t) of the high-speed train;

the train control system state acquisition module is respectively in communication connection with the train ideal position acquisition module, the train ideal speed acquisition module, the train positioning module and the train speed measurement module and is used for enabling the real-time position p (t) and the ideal position p_d(t) differencing to obtain a real-time position error e_p(t) comparing the real-time velocity v (t) with the ideal velocity v_d(t) differencing to obtain a real-time speed error e_v(t) and based on the real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), obtaining a high-speed train optimization control model;

the control strategy generation module is used for solving the high-speed train optimization control model through an integral reinforcement learning algorithm based on an actor-criticizing family neural network structure to obtain an optimal control strategy of the high-speed train;

and the train control module is used for tracking and controlling the speed based on the optimal control strategy of the high-speed train.

Preferably, the train control system state acquisition module is based on a real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), the process of obtaining the optimal control model of the high-speed train specifically comprises the following steps:

for real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t) integrating to obtain a high-speed train composite system state formula X (t) ═ e_p(t),e_v(t),p_d(t),v_d(t)]^T(1) And high speed train composite system expression formula

In the formula u_rControl input in the reinforcement learning process;

obtaining a composite system value function of the high-speed train based on the formulas (1) and (2)

In the formula (I), the compound is shown in the specification,

discount factor 0<γ<1, where ρ (v) is a control input limited value with velocity v as an independent variable, and Q and R are the sameA state weight matrix and an input weight matrix.

Preferably, the control strategy generation module includes,

a criticizing neural network submodule for updating the criticizing neural network weight through a first adaptive law, wherein the first adaptive law comprises

And

in the formula, W₁In order to criticize the home neural network weight vector,

in order to criticize the basis functions of the neural network,

for criticizing the first derivative, alpha, of the basis functions of the home neural network on the state X of the high-speed train complex system₁>0 is the adaptive estimated rate coefficient;

an actor neural network sub-module to update the actor neural network weight vector through a second adaptive law comprising

And

in the formula, W₂For the actor's neural network weight vector,

is the basis function of the actor's neural network,

compounding high speed trains for basis functions of actor neural networksFirst derivative of system state X, alpha₂>0 is the adaptive estimated rate coefficient, Y>0 is a control constant;

a train control input quantum module for obtaining the control input quantity of the high-speed train for integral reinforcement learning based on the update of the criticizing family neural network weight and the actor neural network weight

Sending the control input quantity of the high-speed train with the integral reinforcement learning to a train control module;

the neural network weight vector register is used for storing the updated actor neural network weight vectors in real time to obtain an actor neural network weight data set;

an optimal control strategy sub-module for analyzing the actor neural network weight data set to obtain the optimal weight vector of the actor neural network

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train

It can be seen from the technical solutions provided by the embodiments of the present invention that the present invention provides a speed tracking control method and an automatic driving control system for a high-speed train, wherein the tracking control method designs an ATO control algorithm completely independent of the internal dynamic characteristics of a train control system, and solves an optimal train speed tracking control strategy by analyzing and utilizing train operation state data based on an integral reinforcement learning technology, and performs train and train speed tracking control according to the optimal train and train speed tracking control strategy, thereby solving the problem of control performance degradation caused by uncertainty of train dynamics characteristics, and ensuring that train control input is constrained within a preset value, thereby avoiding the actuator saturation phenomenon. The automatic driving control system controls the input to be limited within the preset value under the action of the tracking control method, thereby avoiding the saturation of the actuator, controlling the train to drive according to a given target speed-distance curve and realizing the automatic driving of the high-speed train.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a processing flow chart of a speed tracking control method for a high-speed train according to the present invention;

fig. 2 is a block diagram illustrating an automatic driving control system of a high-speed train according to the present invention;

FIG. 3 is a logic block diagram of an automatic driving control system of a high-speed train according to the present invention;

FIG. 4 is a schematic diagram illustrating the convergence process of actor neural network weights in the method for tracking and controlling the speed of a high-speed train according to the present invention;

FIG. 5 is a graph of the actual position of the train and the ideal position of the train during the whole operation process of the train;

FIG. 6 is a graph of actual train speed and ideal train speed;

FIG. 7 is a real-time train position error plot;

FIG. 8 is a graph of a real-time train speed error curve;

FIG. 9 is a diagram of the maximum output force of the motor used in the preferred embodiment of the automatic drive control system for high-speed trains according to the present invention;

fig. 10 is a graph showing an actual train traction braking force.

In the figure:

201. the train control system comprises a train ideal position obtaining module 202, a train ideal speed obtaining module 203, a train positioning module 204, a train speed measuring module 205, a train control system state obtaining module 206, a control strategy generating module 2061, a criticizing family neural network submodule 2062, an actor neural network submodule 2063, a train control input quantum module 2064, a neural network weight vector register 2065, an optimal control strategy submodule 207 and a train control module.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Referring to fig. 1, the speed tracking control method for a high-speed train provided by the invention comprises the following steps:

s1 real-time acquiring real-time position p (t) and ideal position p of high-speed train_d(t), real-time velocity v (t) and ideal velocity v_d(t)；

In the embodiment provided by the invention, the ideal position and the ideal speed of the high-speed train can be calculated to obtain a train target speed-distance curve according to the running permission of the high-speed train received from the radio block center and the temporary speed limit command received from the temporary speed limit server, the running time between the train stations can be obtained by receiving the dispatching command from the dispatching center, and the ideal position p of the high-speed train can be respectively obtained in real time by splitting the target speed-distance curve and combining the running time between the stations_d(t) and velocity v_d(t)。

Further, the step S3 specifically includes the following sub-steps:

Establishing a high-speed train composite system expression formula

s32 defining composite system value function of high-speed train based on formulas (1) and (2)

In the formula (I), the compound is shown in the specification,

discount factor 0<γ<1, ρ (v) is a control input limited value with a velocity v as an independent variable, which is simplified to ρ, Q and R are a state weight matrix and an input weight matrix, respectively, for convenience in the subsequent process.

In order to minimize the composite system value function of the high-speed train defined in the above sub-step S32 to obtain the optimal control strategy, in the embodiment provided by the present invention, an integral reinforcement learning algorithm based on an Actor-Critic (Actor-Critic) neural network structure is provided, the Critic neural network is used to evaluate the control strategy, the Actor neural network is used to improve the control strategy according to the evaluation result, and the time interval for reinforcement learning of the control strategy is T. In this embodiment, the specific process of step S4 is implemented, and includes the following sub-steps:

The basis function

And

in the formula, alpha₁>0 is the adaptive estimated rate coefficient, and U is defined by the above formula (4);

And

in the formula, alpha₂>0 is the adaptive estimated rate coefficient, Y>0 is a control constant and is used as a control constant,

inputting dynamics for the composite system;

Applying (inputting) the control input quantity of the high-speed train of the integral reinforcement learning to a control system of the high-speed train to obtain train operation state data at the T + T moment;

s45 executing the above steps S2, S31, S42, S43 and S44 for multiple times to complete the preset inter-train station operation and obtain an actor neural network weight data set storing updated actor neural network weights obtained after the above step S44 is repeatedly executed;

s46 analyzing the updated actor neural network weight in the actor neural network weight data set to obtain the optimal weight vector of the actor neural network

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train

And the speed tracking control system sends the speed tracking control signal to a control system of the high-speed train to perform speed tracking control on the high-speed train.

In a second aspect, the present invention provides an automatic driving control system for a high-speed train to which the above speed tracking control method is applied, as shown in fig. 2 and 3, comprising:

a train ideal position obtaining module 201 for obtaining the ideal position p of the high-speed train in real time_d(t)；

A train ideal speed obtaining module 202 for obtaining the ideal speed v of the high-speed train in real time_d(t)；

The train positioning module 203 is used for acquiring a real-time position p (t) of the high-speed train;

the train speed measuring module 204 is used for acquiring the real-time speed v (t) of the high-speed train;

a train control system state obtaining module 205, which is respectively in communication connection with the train ideal position obtaining module 201, the train ideal speed obtaining module 202, the train positioning module 203 and the train speed measuring module 204, and is used for connecting the real-time position p (t) and the ideal position p_d(t) differencing to obtain a real-time position error e_p(t) comparing the real-time velocity v (t) with the ideal velocity v_d(t) differencing to obtain a real-time speed error e_v(t) and based on the real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), obtaining a high-speed train optimization control model;

a control strategy generation module 206, which is in communication connection with the train control system state acquisition module 205, and is used for solving the high-speed train optimization control model through an integral reinforcement learning algorithm based on actor-criticizing family neural network structure to obtain an optimal control strategy of the high-speed train;

and a train control module 207, which is in communication connection 206 with the control strategy generation module, for speed tracking control based on the optimal control strategy of the high-speed train.

Further, the train control system state acquisition module is based on the real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), the process of obtaining the optimal control model of the high-speed train specifically comprises the following steps:

for real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t) integrating to obtain a high-speed train composite system state formula X (t) ═ e_p(t),e_v(t),p_d(t),v_d(t)]^T(1) To do so byAnd high-speed train composite system expression formula

In the formula u_rControl input in the reinforcement learning process;

In the formula (I), the compound is shown in the specification,

discount factor 0<γ<1, ρ (v) is a control input limited value with velocity v as an argument, and Q and R are a state weight matrix and an input weight matrix, respectively.

Further, the control strategy generation module 206 includes,

a criticizing nervousness network submodule 2061 for updating the criticizing nervousness network weight by a first adaptive law, the first adaptive law including

And

in order to criticize the basis functions of the neural network,

for criticizingBasis functions of neural networks

First derivative, alpha, of the state X of the composite system of a high-speed train₁>0 is the adaptive estimated rate coefficient;

an actor neural network sub-module 2062 for updating the actor neural network weight vector through a second adaptation law comprising

And

in the formula, W₂For the actor's neural network weight vector,

is the basis function of the actor's neural network,

basis functions for actor neural network weights

First derivative, alpha, of the state X of the composite system of a high-speed train₂>0 is the adaptive estimated rate coefficient, Y>0 is a control constant;

a train control input quantum module 2063 for obtaining the high-speed train control input quantity of integral reinforcement learning based on the update of the criticizing family neural network weight and the actor neural network weight

The control input quantity of the high-speed train of the integral reinforcement learning is sent to a train control module 207;

a neural network weight vector register 2064, configured to store the updated actor neural network weight vector in real time, and obtain an actor neural network weight data set;

an optimal control strategy sub-module 2065 for analyzing the actor neural network weight data set to obtain an optimal weight vector of the actor neural network

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train

The invention also provides an embodiment for exemplarily showing the working process of the invention.

The train control module is described by a high-speed train dynamics model as follows

Wherein p (t) and v (t) are the position and speed of the train, respectively, and u (t) is u in the step S44 in the reinforcement learning process_rIn the speed tracking control, u (t) is the control input u in the above step S46, the train mass M is 500t, and the davis coefficient a is 7.75 × 10^-3，b＝2.278*10^-4，c＝1.66*10^-5(ii) a Other control parameters are:

wherein X₁、X₂、X₃、X₄Is a component of the composite system state X.

Fig. 4 is a convergence process of the Actor neural network weight in the integral reinforcement learning process of the present invention, and the vector of the Actor neural network weight after convergence can be obtained as follows:

and the convergence of the Actor neural network weight shows the convergence of the optimal control strategy in the integral reinforcement learning, and then the high-speed train speed tracking control is carried out according to the obtained optimal strategy, namely the integral reinforcement learning process in the fourth part of the block diagram is replaced by the optimal strategy control process.

Fig. 5 shows the actual position of the train and the ideal position of the train in the whole process of train operation, fig. 6 shows the actual speed of the train and the ideal speed of the train, the segment lines in fig. 5 and 6 show the ideal position and speed of the train, and the solid lines show the actual position and speed of the train; fig. 7 shows a real-time train position error, fig. 5 shows a real-time train speed error, and by analyzing fig. 7 and fig. 8, an optimal speed tracking control strategy obtained by integral reinforcement learning according to the present invention can be obtained, so that an accurate train position and speed tracking control effect can be achieved.

FIG. 9 is the maximum output force of the motor used in the simulation of the present invention, it can be seen that the maximum output value of the motor of the train is a function related to the train speed, which is specifically shown that when the train speed is 0< v <50, the maximum output value is a constant value, and when the train speed v is greater than or equal to 50, the maximum output value and the train speed are in an inverse proportional relationship; fig. 10 shows the actual train traction braking force in a solid line and the limited value of the traction braking force in a segmented line, and it can be seen that the control input is limited within the limited value under the control method designed by the present invention.

In conclusion, the invention provides a speed tracking control method and an automatic driving control system of a high-speed train, wherein the tracking control method designs an ATO control algorithm which is completely independent of the internal dynamic characteristics of a train control system, an optimal train speed tracking control strategy is solved by analyzing and utilizing train running state data based on an integral reinforcement learning technology, and train speed tracking control is carried out according to the optimal train speed tracking control strategy, so that the problem of control performance reduction caused by uncertainty of train dynamic characteristics is solved, and train control input is guaranteed to be restricted within a preset value, so that the saturation phenomenon of an actuator is avoided. The automatic driving control system controls the input to be limited within the preset value under the action of the tracking control method, thereby avoiding the saturation of the actuator, controlling the train to drive according to a given target speed-distance curve and realizing the automatic driving of the high-speed train.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A speed tracking control method of a high-speed train is characterized by comprising the following steps:

S3 basing on real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), obtaining a high-speed train optimization control model; the method specifically comprises the following steps:

s31 pairs real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t) integrating to obtain a high-speed train composite system state formula X (t) ═ e_p(t)，e_v(t)，p_d(t)，v_d(t)]^T(1) According to train dynamics models

Establishing a high-speed train composite system expression formula

inputting dynamic state for a composite system, and ur is the control input quantity of the high-speed train for integral reinforcement learning;

V(X(t))＝∫_t ^∞e^-γ(τ-t)[X^T(τ)QX(τ)+U]d τ (3) in which,

the discount factor 0< gamma < 1, rho (v) is a control input limited value with speed v as an independent variable, and Q and R are a state weight matrix and an input weight matrix respectively;

s4, solving the optimal control model of the high-speed train through an integral reinforcement learning algorithm based on an actor-criticizing family neural network structure to obtain an optimal control strategy of the high-speed train; performing speed tracking control on the high-speed train based on the optimal control strategy of the high-speed train;

the integral reinforcement learning algorithm based on the actor-criticizing family neural network structure includes an actor neural network and a criticizing family neural network, and the step S4 specifically includes:

s41 setting criticizing family neural network weight vector as W₁Setting actor neural network weight vector to W₂(ii) a Setting criticizing family neural network and actorThe basis function of the neural network is

The basis function

And

in the formula, alpha₁The rate coefficient is self-adaptive estimated if the rate coefficient is more than 0;

And

in the formula, alpha₂The adaptive estimation rate coefficient is more than 0, and the control constant is more than 0;

Using the control input amount of the high-speed train for high-speed trainThe train control system acquires train running state data at the T + T moment; where ρ is a simplified form of ρ (v);

s45 executing the steps S2, S31, S42, S43 and S44 a plurality of times to obtain an actor neural network weight data set;

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train

2. An automatic driving control system of a high-speed train, characterized by performing the method of claim 1, comprising:

a train control system state acquisition module which is respectively in communication connection with the train ideal position acquisition module, the train ideal speed acquisition module, the train positioning module and the train speed measurement module and is used for connecting the real-time position p (t) with the ideal position p_d(t) differencing to obtain a real-time position error e_p(t) comparing the real-time velocity v (t) with the ideal velocity v_d(t) differencing to obtain a real-time speed error e_v(t) and based on the real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), obtaining a high-speed train optimization control model;

the train control module is used for tracking and controlling the speed based on the optimal control strategy of the high-speed train;

the train control system state acquisition module is based on a real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t), the process of obtaining the optimal control model of the high-speed train specifically comprises the following steps:

for real-time position error e_p(t), real time speed error e_v(t) ideal position p_d(t) and ideal velocity v_d(t) integrating to obtain a high-speed train composite system state formula X (t) ═ e_p(t)，e_v(t)，p_d(t)，v_d(t)]^T(1) And high speed train composite system expression formula

In the formula u_rControlling input quantity for the high-speed train of integral reinforcement learning;

V(X(t))＝∫_t ^∞e^-γ(τ-t)[X^T(τ)QX(τ)+U]d τ (3) in which,

the control strategy generation module includes a control strategy generation module,

And

in order to criticize the basis functions of the neural network,

for criticizing the first derivative, alpha, of the basis functions of the home neural network on the state X of the high-speed train complex system₁The rate coefficient is self-adaptive estimated if the rate coefficient is more than 0;

And

in the formula, W₂For the actor's neural network weight vector,

is the basis function of the actor's neural network,

first derivative of high speed train composite system state X for actor neural network basis functionNumber, alpha₂The adaptive estimation rate coefficient is more than 0, and the control constant is more than 0;

Sending the control input quantity of the high-speed train with the integral reinforcement learning to the train control module;

Based on the optimal weight vector

Obtaining optimal control strategy of high-speed train